-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Conversation
As the basis for this code is with BSD 2-Clause "Simplified" License
|
I haven't looked at this yet but it sounds like we might need a TPN file in this folder, like eg https://github.com/dotnet/corefx/blob/master/src/System.Private.Xml/tests/Xslt/TestFiles/TestData/THIRD-PARTY-NOTICES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the implementation's correctness. This was just a quick skim looking for reliability issues.
do | ||
{ | ||
AssertRead<Vector256<sbyte>>(ref src, ref srcStart, sourceLength); | ||
Vector256<sbyte> str = Unsafe.As<byte, Vector256<sbyte>>(ref src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only use Unsafe.As
if this is known to be aligned. Use Unsafe.LoadUnaligned
otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it should not matter, as on x86 unaligned read/writes are emitted, and this section of the code is only executed if SSSE3 or AVX2 is available.
Anyway, I changed it, to make it more obvious that unaligned movs are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, I missed a point -> https://github.com/dotnet/coreclr/issues/21132
(Forgot to copy over the comment for this). So Unsafe.As
is the faster option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Would it be "more safe" when Unsafe.As
would emit unaligned read / writes?
{ | ||
int vectorElements = Unsafe.SizeOf<TVector>(); | ||
ref byte readEnd = ref Unsafe.Add(ref src, vectorElements); | ||
ref byte srcEnd = ref Unsafe.Add(ref srcStart, srcLength + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GC might not be able to track this ref. Since you're checking for error conditions, I recommend using raw pointers rather than GC-tracked refs, rewriting this method in terms of the 'fixed' statement. Also check for boundary conditions like integer overflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same comment for AssertWrite.)
int sourceIndex = 0; | ||
int destIndex = 0; | ||
// max. 2 padding chars | ||
if (destLength + 2 < decodedLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
destLength + 2 could integer overflow and result in a negative value being compared. This doesn't appear to be handled properly later in the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this -- thanks.
Even without the fix this should not be a problem, as
// This should never overflow since destLength here is less than int.MaxValue / 4 * 3 (i.e. 1610612733) |
str = Avx2.PermuteVar8x32(@out, permuteVec).AsSByte(); | ||
|
||
AssertWrite<Vector256<sbyte>>(ref destBytes, ref destStart, destLength); | ||
Unsafe.As<byte, Vector256<sbyte>>(ref destBytes) = str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: Unsafe.As here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually shouldn't matter, we will only emit an unaligned move here and the above conditions will ensure this only happens on x86 (where hardware support exists).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For completeness) https://github.com/dotnet/coreclr/issues/21132
{ | ||
ref byte srcStart = ref src; | ||
ref byte destStart = ref destBytes; | ||
ref byte simdSrcEnd = ref Unsafe.Add(ref src, (IntPtr)((uint)sourceLength - 45 + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When calling Unsafe.Add
, it's legal to create a ref that points to just past the end of the buffer. For example, assume I have a Span<byte>
span of length 5. Consider the following.
ref byte a = ref MemoryMarshal.GetReference(span); // &span[0], valid ref
ref byte b = ref Unsafe.Add(ref a, 5); // &span[5], valid ref
ref byte c = ref Unsafe.Add(ref b, 1); // &span[6], *invalid* ref
In the above example, both a
and b
are valid GC-tracked refs. Since b
points to memory outside the buffer, it must not be dereferenced. But since b
points just beyond the end of the buffer, the GC can still track it, so operations like Unsafe.IsAddressLessThan(a, b)
will still work as expected.
c
, on the other hand, is further than just beyond the end of the buffer, so the GC cannot track it. If the underlying object moves in memory, the GC is guaranteed to keep a
and b
in sync, but it makes no such guarantees for c
. Therefore comparing a
and c
(or b
and c
) against each other results in undefined behavior.
The reason I mention this is that we need to be careful when we're creating refs that might be beyond the bounds of the buffer. I don't know from this particular call site if the call is valid from a GC-tracking perspective, so I wanted to draw your attention to it so that you can verify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This same comment applies to other instances of Unsafe.Add
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect explanation -- thank you!
It is always within the bounds of the buffer.
At this case here sourceLength
is guaranteed to be >= 45
, so simdSrcEnd
is within the buffer.
The stride is 32
, so it is also within the buffer.
The other places have the same guarantees, as for encoding the min length is >= register size and the stride is register size / 4 * 3, so less than the register size.
For decoding the stride is the register size, the min length is register size + max two padding + zeros that are written (see comments here and here, so 24 (SSSE3) or 45 (AVX2).
These are tests. The shipping bits needs to be done in a different way. https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/contributing.md#copying-files-from-other-projects describes the proper way to do it. |
private static unsafe void AssertRead<TVector>(ref byte src, ref byte srcStart, int srcLength) | ||
{ | ||
fixed (byte* pSrc = &src) | ||
fixed (byte* pSrcStart = &srcStart) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use Unsafe.IsAddressGreaterThan
and not need to pin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed in 7115c84 because of #34529 (comment)
Personally I'd like the Unsafe
-variant more (just need to fix it to be GC-tracked correctly).
Shall I revert this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend pinning once at the entry of the public method, and use regular pointers throughput the rest of the code.
The tricky byref arithmetic is error prone. It is not worth it to use it here. It is worth using it only in the lowest level methods where the few extra instructions that fixed
compiles into show up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So make all the code (even the existing scalar one) to use raw-pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked where it is losing the cycles? It is more than what I would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dasm for the "work" is nearly identical.
For pinning a bit more code is generated, but the slowdown is more than expected and greater than on other places. Maybe it comes from code and loop alignment?
Is there anything what can be tested?
I made several attempts and tweaks to get better results, this is the best I got so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rep stosd
in the prolog looks suspicious too me. https://github.com/dotnet/coreclr/issues/13827 maybe similar, but I have to admit that I'm missing knowledge on this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the overhead comes from the checks for empty Spans in Snap pinning.
You can try the direct Snap pinning, like fixed (byte* srcBytes = &MemoryMarshal.GetReference(utf8))
to see whether it makes a difference.
Also, the encodingMap can stay on the byref plan for now. Eventually, it should changed to pre-initialized ReadOnlySpan, but that can be done as a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the overhead comes from the checks for empty Spans in Snap pinning.
This overhead is negligible and more or less within noise (see results for encoding, decoding).
With ee47609 (latest commit) we get equal results (encoding, decoding) to the pure byref-version (before 0d2cbc4) (and due to the tweaks, even an improvment for large data).
In the dasm the difference is in the prolog. Excerpt of the diff shown for encoding, decoding is similar:
G_M39726_IG01:
push rbp
push r15
push r14
- push r13
push r12
push rbx
- sub rsp, 56
+ sub rsp, 48
vzeroupper
- lea rbp, [rsp+60H]
- mov r12, rcx
- mov r13, rdi
- lea rdi, [rbp-60H]
- mov ecx, 6
+ lea rbp, [rsp+50H]
xor rax, rax
- rep stosd
- mov rcx, r12
- mov rdi, r13
- mov bword ptr [rbp-38H], rdi
- mov qword ptr [rbp-30H], rsi
- mov bword ptr [rbp-48H], rdx
- mov qword ptr [rbp-40H], rcx
+ mov qword ptr [rbp-48H], rax
+ mov qword ptr [rbp-50H], rax
+ mov bword ptr [rbp-30H], rdi
+ mov qword ptr [rbp-28H], rsi
+ mov bword ptr [rbp-40H], rdx
+ mov qword ptr [rbp-38H], rcx
mov rbx, r8
mov r14, r9
G_M39726_IG02:
- cmp dword ptr [rbp-30H], 0
- ja SHORT G_M39726_IG04
+ cmp dword ptr [rbp-28H], 0
+ ja SHORT G_M39753_IG04
xor eax, eax
mov dword ptr [rbx], eax
mov dword ptr [r14], eax
-; Total bytes of code 1265, prolog size 69 for method Base64:EncodeToUtf8(struct,struct,byref,byref,bool):int
+; Total bytes of code 1209, prolog size 52 for method Base64:EncodeToUtf8(struct,struct,byref,byref,bool):int
; ============================================================
Full dasm-diffs: encoding, decoding
Maybe rep stosd
was really the cause for this slowdown. A brief enquiry showed that it has quite high startup overhead (~35 cycles) [1], is sensitive to alignment [2]. Further info in [3].
Aside: And it seems that when having three fixed
, that rep stosd
is issued (didn't investigate deeply, just checked with a simple test).
pre-initialized ReadOnlySpan
I tried this for encoding and decoding (it also works for sbyte
).
There is some strange codegen with unnecessary stack spills, redundant loading (here it is displayed as 0xD1FFAB1E
, actually it is the same address), and multiple calls to CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE.
ROS as local was also tried, with the same result as ROS as property.
|
||
if (utf8.Length == 0) | ||
goto DoneExit; | ||
if (Avx2.IsSupported && maxSrcLength >= 45) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the length here taking into account other potential drawbacks from executing 256-bit instructions?
Some examples are:
- For unaligned data, reads/writes crossing a cache-line boundary will happen twice as frequently (same with crossing page boundaries, for large enough data)
- Additional saving/restoring of the upper 128-bits across method call boundaries
- Additional
vzeroupper
calls - Possible frequency downscaling when executing a "heavy" 256-bit workload (this one tends to make micro-benches look good, but real world scenarios can actually regress)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The length here doesn't take any of your points / potential AVX drawbacks into account.
I thought about aligning writes for encoding / reads for decoding, as they are multiple of fours, and so (in theory) this could be done. But it's not easy to do this without "eating" up quite a lot in a scalar way.
Do you have a suggestion what to do with the length / what it should take specifically into account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion what to do with the length / what it should take specifically into account?
The first three can probably just be profiled multiple times with varying ranges of input data. The last one is really hard to determine outside of profiling real-world scenarios.
But it's not easy to do this without "eating" up quite a lot in a scalar way.
Did you look at processing the leading/trailing elements via vectorization as well (which should give you, at most, 2 unaligned read/writes)? I think you might have mentioned trying this on the other thread...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processing the leading/trailing elements via vectorization
You're correct that I tried this, but w/o success, as the bookkeeping for this produces more overhead than the scalar processing.
Base64 is a bit tricky with vectorization, as not all elements of the vector produce usable results. Some elements are just 0, and will get overwritten in the next iteration / in the scalar remainder. Then there's the padding. To take all this into account produces quite a lot overhead -- or I'm not thinking clever enough to come with an easy and correct solution for this.
@gfoidl what's the status of this PR? It seems to be stuck for 1.5 month now unless I missed something. |
@karelz from my point of view it's ready for further review (except the merge conflict due to the comments -- I'll rebase it). @GrabYourPitchforks can you have another look here? |
Rebased due to conflicts (from #35354)
|
@GrabYourPitchforks could you please take another look so we can shepherd this to merging? |
In 925f7ed ROSpan is only used for static vector-data, not for the encoding/decoding maps. https://github.com/gfoidl/corefx/commit/a3dbc990a6d42c6e8c3934be3144913e3067000f is the change for encoding/decoding maps, but (for me) the codegen is not ideal, as the ref to the static data isn't kept in a register. G_M39788_IG13:
0FB602 movzx rax, byte ptr [rdx]
440FB64A01 movzx r9, byte ptr [rdx+1]
440FB65202 movzx r10, byte ptr [rdx+2]
C1E010 shl eax, 16
41C1E108 shl r9d, 8
410BC1 or eax, r9d
410BC2 or eax, r10d
448BC8 mov r9d, eax
41C1E912 shr r9d, 18
- 460FB60C0F movzx r9, byte ptr [rdi+r9]
+ 49BEF71B0898847F0000 mov r14, 0x7F8498081BF7
+ 420FB61C33 movzx r9, byte ptr [r9+r14]
448BD0 mov r10d, eax
41C1EA0C shr r10d, 12
4183E23F and r10d, 63
- 460FB61417 movzx r10, byte ptr [rdi+r10]
+ 49BFF71B0898847F0000 mov r15, 0x7F8498081BF7
+ 470FB6343E movzx r10, byte ptr [r10+r15]
448BD8 mov r11d, eax
41C1EB06 shr r11d, 6
4183E33F and r11d, 63
- 460FB61C1F movzx r11, byte ptr [rdi+r11]
+ 49BCF71B0898847F0000 mov r12, 0x7F8498081BF7
+ 470FB63C27 movzx r11, byte ptr [r11+r12]
83E03F and eax, 63
+ 0FB60407 movzx rax, byte ptr [rax+r12]
41C1E208 shl r10d, 8
450BCA or r9d, r10d
41C1E310 shl r11d, 16
450BCB or r9d, r11d
C1E018 shl eax, 24
410BC1 or eax, r9d
8901 mov dword ptr [rcx], eax
4883C203 add rdx, 3
4883C104 add rcx, 4
493BD0 cmp rdx, r8
7290 jb SHORT G_M39788_IG13 Is there any hint to the JIT to keep the ref to static data in a register? Advantageous is that in the setup of the method (i.e. outside the loops) the code gets less:G_M39788_IG04:
- 488D7DD0 lea rdi, bword ptr [rbp-30H]
- 4C8B3F mov r15, bword ptr [rdi]
- 4C897DB8 mov bword ptr [rbp-48H], r15
- 488D7DC0 lea rdi, bword ptr [rbp-40H]
- 4C8B27 mov r12, bword ptr [rdi]
- 4C8965B0 mov bword ptr [rbp-50H], r12
- 48BFA00E8741697F0000 mov rdi, 0x7F6941870EA0
- BE04000000 mov esi, 4
- E82504C078 call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
- 48B8280B002C697F0000 mov rax, 0x7F692C000B28
- 488B38 mov rdi, gword ptr [rax]
- 837F0800 cmp dword ptr [rdi+8], 0
- 0F86FE030000 jbe G_M39788_IG25
- 4883C710 add rdi, 16
- 8B75D8 mov esi, dword ptr [rbp-28H]
- 8B4DC8 mov ecx, dword ptr [rbp-38H]
- 81FEFDFFFF5F cmp esi, 0x5FFFFFFD
- 7F2C jg SHORT G_M39788_IG06
- 81FEFDFFFF5F cmp esi, 0x5FFFFFFD
- 0F87D8030000 ja G_M39788_IG24
+ 488D45D0 lea rax, bword ptr [rbp-30H]
+ 488B38 mov rdi, bword ptr [rax]
+ 48897DB8 mov bword ptr [rbp-48H], rdi
+ 488D45C0 lea rax, bword ptr [rbp-40H]
+ 488B30 mov rsi, bword ptr [rax]
+ 488975B0 mov bword ptr [rbp-50H], rsi
+ 8B4DD8 mov ecx, dword ptr [rbp-28H]
+ 448B55C8 mov r10d, dword ptr [rbp-38H]
+ 81F9FDFFFF5F cmp ecx, 0x5FFFFFFD
+ 7F2D jg SHORT G_M39793_IG06
+ 81F9FDFFFF5F cmp ecx, 0x5FFFFFFD
+ 0F871E040000 ja G_M39793_IG24 |
Since you are unsafe anyway, |
The case with decoding encoded 16 bytes was not covered by tests, so a wrong code got commited before, resulting in DestinationTooSmall instead of the correct Done.
So got rid of the `rep stosd` in the prolog. Cf. #34529 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few remaining nits, but overall LGTM.
Benchmark-results didn't change from #34529 (comment) dasm for encode; Assembly listing for method Base64:EncodeToUtf8(struct,struct,byref,byref,bool):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 5, 4 ) struct (16) [rbp-0x30] do-not-enreg[XSFB] addr-exposed ld-addr-op
; V01 arg1 [V01 ] ( 4, 3 ) struct (16) [rbp-0x40] do-not-enreg[XSFB] addr-exposed ld-addr-op
; V02 arg2 [V02,T20] ( 6, 4 ) byref -> r8
; V03 arg3 [V03,T21] ( 6, 4 ) byref -> r9
; V04 arg4 [V04,T65] ( 1, 0.50) bool -> [rbp+0x10]
; V05 loc0 [V05,T23] ( 8, 4 ) long -> rdi
; V06 loc1 [V06 ] ( 1, 0.50) byref -> [rbp-0x48] must-init pinned
; V07 loc2 [V07,T24] ( 6, 3 ) long -> rsi
; V08 loc3 [V08 ] ( 1, 0.50) byref -> [rbp-0x50] must-init pinned
; V09 loc4 [V09,T25] ( 6, 3 ) int -> rcx
; V10 loc5 [V10,T44] ( 3, 1.50) int -> r10
; V11 loc6 [V11,T28] ( 4, 2 ) int -> rax
; V12 loc7 [V12,T00] ( 28, 35 ) long -> registers ld-addr-op
; V13 loc8 [V13,T02] ( 16, 18.50) long -> r10 ld-addr-op
; V14 loc9 [V14,T26] ( 6, 3 ) long -> rcx
; V15 loc10 [V15,T17] ( 8, 7.50) long -> r11
;* V16 loc11 [V16,T52] ( 0, 0 ) byref -> zero-ref
; V17 loc12 [V17,T09] ( 6, 10 ) int -> rax
; V18 loc13 [V18,T22] ( 6, 6 ) long -> rax
;# V19 OutArgs [V19 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
;* V20 tmp1 [V20 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
;* V21 tmp2 [V21 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V22 tmp3 [V22 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V23 tmp4 [V23 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V24 tmp5 [V24,T76] ( 2, 2.50) simd32 -> mm0 "Inline stloc first use temp"
;* V25 tmp6 [V25 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V26 tmp7 [V26,T77] ( 2, 2.50) simd32 -> mm1 "Inline stloc first use temp"
;* V27 tmp8 [V27 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V28 tmp9 [V28,T78] ( 2, 2.50) simd32 -> mm2 "Inline stloc first use temp"
;* V29 tmp10 [V29 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V30 tmp11 [V30,T79] ( 2, 2.50) simd32 -> mm3 "Inline stloc first use temp"
;* V31 tmp12 [V31 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V32 tmp13 [V32,T80] ( 2, 2.50) simd32 -> mm4 "Inline stloc first use temp"
; V33 tmp14 [V33,T81] ( 2, 2.50) simd32 -> mm5 "Inline stloc first use temp"
; V34 tmp15 [V34,T82] ( 2, 2.50) simd32 -> mm6 "Inline stloc first use temp"
;* V35 tmp16 [V35 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V36 tmp17 [V36,T83] ( 2, 2.50) simd32 -> mm7 "Inline stloc first use temp"
; V37 tmp18 [V37,T10] ( 6, 9 ) long -> r10 "Inline stloc first use temp"
; V38 tmp19 [V38,T18] ( 5, 7 ) long -> rdx "Inline stloc first use temp"
; V39 tmp20 [V39,T66] ( 14, 23.50) simd32 -> mm8 "Inline stloc first use temp"
;* V40 tmp21 [V40 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V41 tmp22 [V41,T92] ( 2, 2 ) simd32 -> mm9 "struct address for call/obj"
; V42 tmp23 [V42,T68] ( 2, 4 ) simd32 -> mm9 "Inline stloc first use temp"
; V43 tmp24 [V43,T69] ( 2, 4 ) simd32 -> mm9 "Inline stloc first use temp"
; V44 tmp25 [V44,T70] ( 2, 4 ) simd32 -> mm9 "Inline stloc first use temp"
; V45 tmp26 [V45,T71] ( 2, 4 ) simd32 -> mm9 "Inline stloc first use temp"
;* V46 tmp27 [V46 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V47 tmp28 [V47 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V48 tmp29 [V48 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V49 tmp30 [V49 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V50 tmp31 [V50 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V51 tmp32 [V51,T93] ( 2, 1 ) simd32 -> mm1 "Inline return value spill temp"
; V52 tmp33 [V52,T94] ( 2, 1 ) simd16 -> mm1 "Inline stloc first use temp"
; V53 tmp34 [V53,T95] ( 2, 1 ) simd32 -> mm2 "Inline return value spill temp"
; V54 tmp35 [V54,T96] ( 2, 1 ) simd16 -> mm2 "Inline stloc first use temp"
; V55 tmp36 [V55,T97] ( 2, 1 ) simd32 -> mm3 "Inline return value spill temp"
; V56 tmp37 [V56,T98] ( 2, 1 ) simd16 -> mm3 "Inline stloc first use temp"
; V57 tmp38 [V57,T99] ( 2, 1 ) simd32 -> mm4 "Inline return value spill temp"
; V58 tmp39 [V58,T100] ( 2, 1 ) simd16 -> mm4 "Inline stloc first use temp"
; V59 tmp40 [V59,T101] ( 2, 1 ) simd32 -> mm5 "Inline return value spill temp"
; V60 tmp41 [V60,T102] ( 2, 1 ) simd16 -> mm5 "Inline stloc first use temp"
; V61 tmp42 [V61,T103] ( 2, 1 ) simd32 -> mm6 "Inline return value spill temp"
; V62 tmp43 [V62,T104] ( 2, 1 ) simd16 -> mm6 "Inline stloc first use temp"
;* V63 tmp44 [V63 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V64 tmp45 [V64 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V65 tmp46 [V65 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V66 tmp47 [V66 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V67 tmp48 [V67 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V68 tmp49 [V68 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V69 tmp50 [V69 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V70 tmp51 [V70 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V71 tmp52 [V71 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V72 tmp53 [V72 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V73 tmp54 [V73 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V74 tmp55 [V74,T84] ( 2, 2.50) simd16 -> mm0 "Inline stloc first use temp"
;* V75 tmp56 [V75 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V76 tmp57 [V76,T85] ( 2, 2.50) simd16 -> mm1 "Inline stloc first use temp"
;* V77 tmp58 [V77 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V78 tmp59 [V78,T86] ( 2, 2.50) simd16 -> mm2 "Inline stloc first use temp"
;* V79 tmp60 [V79 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V80 tmp61 [V80,T87] ( 2, 2.50) simd16 -> mm3 "Inline stloc first use temp"
;* V81 tmp62 [V81 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V82 tmp63 [V82,T88] ( 2, 2.50) simd16 -> mm4 "Inline stloc first use temp"
; V83 tmp64 [V83,T89] ( 2, 2.50) simd16 -> mm5 "Inline stloc first use temp"
; V84 tmp65 [V84,T90] ( 2, 2.50) simd16 -> mm6 "Inline stloc first use temp"
;* V85 tmp66 [V85 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V86 tmp67 [V86,T91] ( 2, 2.50) simd16 -> mm7 "Inline stloc first use temp"
; V87 tmp68 [V87,T11] ( 6, 9 ) long -> rdx "Inline stloc first use temp"
; V88 tmp69 [V88,T19] ( 5, 7 ) long -> r10 "Inline stloc first use temp"
; V89 tmp70 [V89,T67] ( 11, 22 ) simd16 -> mm8 "Inline stloc first use temp"
; V90 tmp71 [V90,T72] ( 2, 4 ) simd16 -> mm9 "Inline stloc first use temp"
; V91 tmp72 [V91,T73] ( 2, 4 ) simd16 -> mm9 "Inline stloc first use temp"
; V92 tmp73 [V92,T74] ( 2, 4 ) simd16 -> mm9 "Inline stloc first use temp"
; V93 tmp74 [V93,T75] ( 2, 4 ) simd16 -> mm9 "Inline stloc first use temp"
;* V94 tmp75 [V94 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V95 tmp76 [V95 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V96 tmp77 [V96 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V97 tmp78 [V97 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V98 tmp79 [V98 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V99 tmp80 [V99,T105] ( 2, 1 ) simd16 -> mm1 "Inline return value spill temp"
; V100 tmp81 [V100,T106] ( 2, 1 ) simd16 -> mm1 "Inline stloc first use temp"
; V101 tmp82 [V101,T107] ( 2, 1 ) simd16 -> mm2 "Inline return value spill temp"
; V102 tmp83 [V102,T108] ( 2, 1 ) simd16 -> mm2 "Inline stloc first use temp"
; V103 tmp84 [V103,T109] ( 2, 1 ) simd16 -> mm3 "Inline return value spill temp"
; V104 tmp85 [V104,T110] ( 2, 1 ) simd16 -> mm3 "Inline stloc first use temp"
; V105 tmp86 [V105,T111] ( 2, 1 ) simd16 -> mm4 "Inline return value spill temp"
; V106 tmp87 [V106,T112] ( 2, 1 ) simd16 -> mm4 "Inline stloc first use temp"
; V107 tmp88 [V107,T113] ( 2, 1 ) simd16 -> mm5 "Inline return value spill temp"
; V108 tmp89 [V108,T114] ( 2, 1 ) simd16 -> mm5 "Inline stloc first use temp"
; V109 tmp90 [V109,T115] ( 2, 1 ) simd16 -> mm6 "Inline return value spill temp"
; V110 tmp91 [V110,T116] ( 2, 1 ) simd16 -> mm6 "Inline stloc first use temp"
;* V111 tmp92 [V111 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V112 tmp93 [V112 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V113 tmp94 [V113 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V114 tmp95 [V114 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V115 tmp96 [V115 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V116 tmp97 [V116 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V117 tmp98 [V117 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V118 tmp99 [V118 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V119 tmp100 [V119 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V120 tmp101 [V120,T12] ( 2, 8 ) int -> rbx "Inline stloc first use temp"
; V121 tmp102 [V121,T03] ( 2, 16 ) int -> rax "impAppendStmt"
; V122 tmp103 [V122,T13] ( 2, 8 ) int -> r14 "Inline stloc first use temp"
; V123 tmp104 [V123,T01] ( 5, 20 ) int -> rax "Inline stloc first use temp"
; V124 tmp105 [V124,T04] ( 2, 16 ) int -> rbx "impAppendStmt"
; V125 tmp106 [V125,T14] ( 2, 8 ) int -> r14 "Inline stloc first use temp"
; V126 tmp107 [V126,T15] ( 2, 8 ) int -> r15 "Inline stloc first use temp"
; V127 tmp108 [V127,T16] ( 2, 8 ) int -> rax "Inline stloc first use temp"
;* V128 tmp109 [V128 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V129 tmp110 [V129,T05] ( 2, 16 ) long -> rbx "NewObj constructor temp"
;* V130 tmp111 [V130 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V131 tmp112 [V131 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V132 tmp113 [V132,T06] ( 2, 16 ) long -> r14 "NewObj constructor temp"
;* V133 tmp114 [V133 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V134 tmp115 [V134 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V135 tmp116 [V135,T07] ( 2, 16 ) long -> r15 "NewObj constructor temp"
;* V136 tmp117 [V136 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V137 tmp118 [V137 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V138 tmp119 [V138,T08] ( 2, 16 ) long -> rax "NewObj constructor temp"
;* V139 tmp120 [V139 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V140 tmp121 [V140 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V141 tmp122 [V141 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V142 tmp123 [V142,T45] ( 3, 1.50) int -> rcx "Inline stloc first use temp"
; V143 tmp124 [V143,T32] ( 2, 2 ) int -> rax "impAppendStmt"
; V144 tmp125 [V144,T48] ( 2, 1 ) int -> rcx "Inline stloc first use temp"
;* V145 tmp126 [V145 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V146 tmp127 [V146,T33] ( 2, 2 ) long -> rax "NewObj constructor temp"
;* V147 tmp128 [V147 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V148 tmp129 [V148 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V149 tmp130 [V149,T34] ( 2, 2 ) long -> rcx "NewObj constructor temp"
;* V150 tmp131 [V150 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V151 tmp132 [V151 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V152 tmp133 [V152 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V153 tmp134 [V153,T49] ( 2, 1 ) int -> rcx "Inline stloc first use temp"
; V154 tmp135 [V154,T35] ( 2, 2 ) int -> rax "impAppendStmt"
; V155 tmp136 [V155,T29] ( 4, 2 ) int -> rax "Inline stloc first use temp"
; V156 tmp137 [V156,T36] ( 2, 2 ) int -> rcx "impAppendStmt"
; V157 tmp138 [V157,T50] ( 2, 1 ) int -> r11 "Inline stloc first use temp"
; V158 tmp139 [V158,T51] ( 2, 1 ) int -> rax "Inline stloc first use temp"
;* V159 tmp140 [V159 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V160 tmp141 [V160,T37] ( 2, 2 ) long -> rcx "NewObj constructor temp"
;* V161 tmp142 [V161 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V162 tmp143 [V162 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V163 tmp144 [V163,T38] ( 2, 2 ) long -> r11 "NewObj constructor temp"
;* V164 tmp145 [V164 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V165 tmp146 [V165 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V166 tmp147 [V166,T39] ( 2, 2 ) long -> rax "NewObj constructor temp"
;* V167 tmp148 [V167 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V168 tmp149 [V168 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V169 tmp150 [V169,T46] ( 0, 0 ) byref -> zero-ref V20._pointer(offs=0x00) P-INDEP "field V20._pointer (fldOffset=0x0)"
;* V170 tmp151 [V170 ] ( 0, 0 ) int -> zero-ref V20._length(offs=0x08) P-INDEP "field V20._length (fldOffset=0x8)"
; V171 tmp152 [V171,T42] ( 3, 1.50) byref -> rdi V21._pointer(offs=0x00) P-INDEP "field V21._pointer (fldOffset=0x0)"
;* V172 tmp153 [V172 ] ( 0, 0 ) int -> zero-ref V21._length(offs=0x08) P-INDEP "field V21._length (fldOffset=0x8)"
; V173 tmp154 [V173,T43] ( 3, 1.50) byref -> rsi V22._pointer(offs=0x00) P-INDEP "field V22._pointer (fldOffset=0x0)"
;* V174 tmp155 [V174 ] ( 0, 0 ) int -> zero-ref V22._length(offs=0x08) P-INDEP "field V22._length (fldOffset=0x8)"
;* V175 tmp156 [V175 ] ( 0, 0 ) byref -> zero-ref V23._pointer(offs=0x00) P-INDEP "field V23._pointer (fldOffset=0x0)"
;* V176 tmp157 [V176 ] ( 0, 0 ) int -> zero-ref V23._length(offs=0x08) P-INDEP "field V23._length (fldOffset=0x8)"
;* V177 tmp158 [V177 ] ( 0, 0 ) byref -> zero-ref V35._pointer(offs=0x00) P-INDEP "field V35._pointer (fldOffset=0x0)"
;* V178 tmp159 [V178 ] ( 0, 0 ) int -> zero-ref V35._length(offs=0x08) P-INDEP "field V35._length (fldOffset=0x8)"
;* V179 tmp160 [V179 ] ( 0, 0 ) byref -> zero-ref V40._pointer(offs=0x00) P-INDEP "field V40._pointer (fldOffset=0x0)"
;* V180 tmp161 [V180 ] ( 0, 0 ) int -> zero-ref V40._length(offs=0x08) P-INDEP "field V40._length (fldOffset=0x8)"
;* V181 tmp162 [V181,T53] ( 0, 0 ) byref -> zero-ref V46._pointer(offs=0x00) P-INDEP "field V46._pointer (fldOffset=0x0)"
;* V182 tmp163 [V182 ] ( 0, 0 ) int -> zero-ref V46._length(offs=0x08) P-INDEP "field V46._length (fldOffset=0x8)"
;* V183 tmp164 [V183,T54] ( 0, 0 ) byref -> zero-ref V47._value(offs=0x00) P-INDEP "field V47._value (fldOffset=0x0)"
;* V184 tmp165 [V184 ] ( 0, 0 ) byref -> zero-ref V48._pointer(offs=0x00) P-INDEP "field V48._pointer (fldOffset=0x0)"
;* V185 tmp166 [V185 ] ( 0, 0 ) int -> zero-ref V48._length(offs=0x08) P-INDEP "field V48._length (fldOffset=0x8)"
;* V186 tmp167 [V186 ] ( 0, 0 ) byref -> zero-ref V49._pointer(offs=0x00) P-INDEP "field V49._pointer (fldOffset=0x0)"
;* V187 tmp168 [V187 ] ( 0, 0 ) int -> zero-ref V49._length(offs=0x08) P-INDEP "field V49._length (fldOffset=0x8)"
;* V188 tmp169 [V188,T55] ( 0, 0 ) byref -> zero-ref V63._pointer(offs=0x00) P-INDEP "field V63._pointer (fldOffset=0x0)"
;* V189 tmp170 [V189 ] ( 0, 0 ) int -> zero-ref V63._length(offs=0x08) P-INDEP "field V63._length (fldOffset=0x8)"
;* V190 tmp171 [V190,T56] ( 0, 0 ) byref -> zero-ref V64._value(offs=0x00) P-INDEP "field V64._value (fldOffset=0x0)"
;* V191 tmp172 [V191 ] ( 0, 0 ) byref -> zero-ref V65._pointer(offs=0x00) P-INDEP "field V65._pointer (fldOffset=0x0)"
;* V192 tmp173 [V192 ] ( 0, 0 ) int -> zero-ref V65._length(offs=0x08) P-INDEP "field V65._length (fldOffset=0x8)"
;* V193 tmp174 [V193 ] ( 0, 0 ) byref -> zero-ref V66._pointer(offs=0x00) P-INDEP "field V66._pointer (fldOffset=0x0)"
;* V194 tmp175 [V194 ] ( 0, 0 ) int -> zero-ref V66._length(offs=0x08) P-INDEP "field V66._length (fldOffset=0x8)"
;* V195 tmp176 [V195,T57] ( 0, 0 ) byref -> zero-ref V68._pointer(offs=0x00) P-INDEP "field V68._pointer (fldOffset=0x0)"
;* V196 tmp177 [V196 ] ( 0, 0 ) int -> zero-ref V68._length(offs=0x08) P-INDEP "field V68._length (fldOffset=0x8)"
;* V197 tmp178 [V197,T58] ( 0, 0 ) byref -> zero-ref V69._value(offs=0x00) P-INDEP "field V69._value (fldOffset=0x0)"
;* V198 tmp179 [V198 ] ( 0, 0 ) byref -> zero-ref V70._pointer(offs=0x00) P-INDEP "field V70._pointer (fldOffset=0x0)"
;* V199 tmp180 [V199 ] ( 0, 0 ) int -> zero-ref V70._length(offs=0x08) P-INDEP "field V70._length (fldOffset=0x8)"
;* V200 tmp181 [V200 ] ( 0, 0 ) byref -> zero-ref V71._pointer(offs=0x00) P-INDEP "field V71._pointer (fldOffset=0x0)"
;* V201 tmp182 [V201 ] ( 0, 0 ) int -> zero-ref V71._length(offs=0x08) P-INDEP "field V71._length (fldOffset=0x8)"
;* V202 tmp183 [V202 ] ( 0, 0 ) byref -> zero-ref V73._pointer(offs=0x00) P-INDEP "field V73._pointer (fldOffset=0x0)"
;* V203 tmp184 [V203 ] ( 0, 0 ) int -> zero-ref V73._length(offs=0x08) P-INDEP "field V73._length (fldOffset=0x8)"
;* V204 tmp185 [V204 ] ( 0, 0 ) byref -> zero-ref V85._pointer(offs=0x00) P-INDEP "field V85._pointer (fldOffset=0x0)"
;* V205 tmp186 [V205 ] ( 0, 0 ) int -> zero-ref V85._length(offs=0x08) P-INDEP "field V85._length (fldOffset=0x8)"
;* V206 tmp187 [V206,T59] ( 0, 0 ) byref -> zero-ref V94._pointer(offs=0x00) P-INDEP "field V94._pointer (fldOffset=0x0)"
;* V207 tmp188 [V207 ] ( 0, 0 ) int -> zero-ref V94._length(offs=0x08) P-INDEP "field V94._length (fldOffset=0x8)"
;* V208 tmp189 [V208,T60] ( 0, 0 ) byref -> zero-ref V95._value(offs=0x00) P-INDEP "field V95._value (fldOffset=0x0)"
;* V209 tmp190 [V209 ] ( 0, 0 ) byref -> zero-ref V96._pointer(offs=0x00) P-INDEP "field V96._pointer (fldOffset=0x0)"
;* V210 tmp191 [V210 ] ( 0, 0 ) int -> zero-ref V96._length(offs=0x08) P-INDEP "field V96._length (fldOffset=0x8)"
;* V211 tmp192 [V211 ] ( 0, 0 ) byref -> zero-ref V97._pointer(offs=0x00) P-INDEP "field V97._pointer (fldOffset=0x0)"
;* V212 tmp193 [V212 ] ( 0, 0 ) int -> zero-ref V97._length(offs=0x08) P-INDEP "field V97._length (fldOffset=0x8)"
;* V213 tmp194 [V213,T61] ( 0, 0 ) byref -> zero-ref V111._pointer(offs=0x00) P-INDEP "field V111._pointer (fldOffset=0x0)"
;* V214 tmp195 [V214 ] ( 0, 0 ) int -> zero-ref V111._length(offs=0x08) P-INDEP "field V111._length (fldOffset=0x8)"
;* V215 tmp196 [V215,T62] ( 0, 0 ) byref -> zero-ref V112._value(offs=0x00) P-INDEP "field V112._value (fldOffset=0x0)"
;* V216 tmp197 [V216 ] ( 0, 0 ) byref -> zero-ref V113._pointer(offs=0x00) P-INDEP "field V113._pointer (fldOffset=0x0)"
;* V217 tmp198 [V217 ] ( 0, 0 ) int -> zero-ref V113._length(offs=0x08) P-INDEP "field V113._length (fldOffset=0x8)"
;* V218 tmp199 [V218 ] ( 0, 0 ) byref -> zero-ref V114._pointer(offs=0x00) P-INDEP "field V114._pointer (fldOffset=0x0)"
;* V219 tmp200 [V219 ] ( 0, 0 ) int -> zero-ref V114._length(offs=0x08) P-INDEP "field V114._length (fldOffset=0x8)"
;* V220 tmp201 [V220,T47] ( 0, 0 ) byref -> zero-ref V116._pointer(offs=0x00) P-INDEP "field V116._pointer (fldOffset=0x0)"
;* V221 tmp202 [V221 ] ( 0, 0 ) int -> zero-ref V116._length(offs=0x08) P-INDEP "field V116._length (fldOffset=0x8)"
;* V222 tmp203 [V222,T63] ( 0, 0 ) byref -> zero-ref V117._value(offs=0x00) P-INDEP "field V117._value (fldOffset=0x0)"
;* V223 tmp204 [V223,T64] ( 0, 0 ) byref -> zero-ref V118._pointer(offs=0x00) P-INDEP "field V118._pointer (fldOffset=0x0)"
;* V224 tmp205 [V224 ] ( 0, 0 ) int -> zero-ref V118._length(offs=0x08) P-INDEP "field V118._length (fldOffset=0x8)"
; V225 tmp206 [V225,T30] ( 2, 2 ) byref -> rax "BlockOp address local"
; V226 tmp207 [V226,T40] ( 2, 2 ) long -> rdi "Cast away GC"
; V227 tmp208 [V227,T31] ( 2, 2 ) byref -> rax "BlockOp address local"
; V228 tmp209 [V228,T41] ( 2, 2 ) long -> rsi "Cast away GC"
; V229 rat0 [V229,T27] ( 3, 3 ) int -> rdx "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 48
G_M39793_IG01:
55 push rbp
4157 push r15
4156 push r14
4154 push r12
53 push rbx
4883EC30 sub rsp, 48
C5F877 vzeroupper
488D6C2450 lea rbp, [rsp+50H]
33C0 xor rax, rax
488945B8 mov qword ptr [rbp-48H], rax
488945B0 mov qword ptr [rbp-50H], rax
48897DD0 mov bword ptr [rbp-30H], rdi
488975D8 mov qword ptr [rbp-28H], rsi
488955C0 mov bword ptr [rbp-40H], rdx
48894DC8 mov qword ptr [rbp-38H], rcx
G_M39793_IG02:
837DD800 cmp dword ptr [rbp-28H], 0
7718 ja SHORT G_M39793_IG04
33C0 xor eax, eax
418900 mov dword ptr [r8], eax
418901 mov dword ptr [r9], eax
G_M39793_IG03:
C5F877 vzeroupper
488D65E0 lea rsp, [rbp-20H]
5B pop rbx
415C pop r12
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M39793_IG04:
488D45D0 lea rax, bword ptr [rbp-30H]
488B38 mov rdi, bword ptr [rax]
48897DB8 mov bword ptr [rbp-48H], rdi
488D45C0 lea rax, bword ptr [rbp-40H]
488B30 mov rsi, bword ptr [rax]
488975B0 mov bword ptr [rbp-50H], rsi
8B4DD8 mov ecx, dword ptr [rbp-28H]
448B55C8 mov r10d, dword ptr [rbp-38H]
81F9FDFFFF5F cmp ecx, 0x5FFFFFFD
7F2D jg SHORT G_M39793_IG06
81F9FDFFFF5F cmp ecx, 0x5FFFFFFD
0F871E040000 ja G_M39793_IG24
G_M39793_IG05:
8D5102 lea edx, [rcx+2]
41BB56555555 mov r11d, 0x55555556
418BC3 mov eax, r11d
F7EA imul edx:eax, edx
8BC2 mov eax, edx
C1E81F shr eax, 31
03C2 add eax, edx
C1E002 shl eax, 2
413BC2 cmp eax, r10d
7F04 jg SHORT G_M39793_IG06
8BC1 mov eax, ecx
EB08 jmp SHORT G_M39793_IG07
G_M39793_IG06:
41C1FA02 sar r10d, 2
438D0452 lea eax, [r10+2*r10]
G_M39793_IG07:
488BD7 mov rdx, rdi
4C8BD6 mov r10, rsi
8BC9 mov ecx, ecx
4803CA add rcx, rdx
448BD8 mov r11d, eax
4C03DA add r11, rdx
83F810 cmp eax, 16
0F8CF1010000 jl G_M39793_IG12
498D43E0 lea rax, [r11-32]
483BC7 cmp rax, rdi
0F8202010000 jb G_M39793_IG10
48BA67DD95C2D27F0000 mov rdx, 0x7FD2C295DD67
C5FD1002 vmovupd ymm0, ymmword ptr[rdx]
41BA00FCC00F mov r10d, 0xFC0FC00
C4C1796ECA vmovd xmm1, r10d
C4E27D58C9 vpbroadcastd ymm1, ymm1
BAF0033F00 mov edx, 0x3F03F0
C5F96ED2 vmovd xmm2, edx
C4E27D58D2 vpbroadcastd ymm2, ymm2
BA40000004 mov edx, 0x4000040
C5F96EDA vmovd xmm3, edx
C4E27D58DB vpbroadcastd ymm3, ymm3
BA10000001 mov edx, 0x1000010
C5F96EE2 vmovd xmm4, edx
C4E27D58E4 vpbroadcastd ymm4, ymm4
BA33000000 mov edx, 51
C5F96EEA vmovd xmm5, edx
C4E27D78ED vpbroadcastb ymm5, ymm5
BA19000000 mov edx, 25
C5F96EF2 vmovd xmm6, edx
C4E27D78F6 vpbroadcastb ymm6, ymm6
48BA87DE95C2D27F0000 mov rdx, 0x7FD2C295DE87
C5FD103A vmovupd ymm7, ymmword ptr[rdx]
488BD6 mov rdx, rsi
C57E6F07 vmovdqu ymm8, ymmword ptr[rdi]
49BA0FDE95C2D27F0000 mov r10, 0x7FD2C295DE0F
C4417D100A vmovupd ymm9, ymmword ptr[r10]
C4423536C0 vpermd ymm8, ymm9, ymm8
4C8D57FC lea r10, [rdi-4]
G_M39793_IG08:
C4623D00C0 vpshufb ymm8, ymm8, ymm0
C53DDBCA vpand ymm9, ymm8, ymm2
C535D5CC vpmullw ymm9, ymm9, ymm4
C53DDBC1 vpand ymm8, ymm8, ymm1
C53DE4C3 vpmulhuw ymm8, ymm8, ymm3
C4413DEBC1 vpor ymm8, ymm8, ymm9
C53D64CE vpcmpgtb ymm9, ymm8, ymm6
C53DD8D5 vpsubusb ymm10, ymm8, ymm5
C4412DF8C9 vpsubb ymm9, ymm10, ymm9
C4424500C9 vpshufb ymm9, ymm7, ymm9
C4413DFCC1 vpaddb ymm8, ymm8, ymm9
C57E7F02 vmovdqu ymmword ptr[rdx], ymm8
4983C218 add r10, 24
4883C220 add rdx, 32
4C3BD0 cmp r10, rax
770E ja SHORT G_M39793_IG09
C4417E6F02 vmovdqu ymm8, ymmword ptr[r10]
EBB7 jmp SHORT G_M39793_IG08
G_M39793_IG09:
498D4204 lea rax, [r10+4]
4C8BD2 mov r10, rdx
483BC1 cmp rax, rcx
0F845B020000 je G_M39793_IG17
488BD0 mov rdx, rax
G_M39793_IG10:
498D43F0 lea rax, [r11-16]
483BC2 cmp rax, rdx
0F82D5000000 jb G_M39793_IG12
48BBB7DD95C2D27F0000 mov rbx, 0x7FD2C295DDB7
C5F91003 vmovupd xmm0, xmmword ptr [rbx]
BB00FCC00F mov ebx, 0xFC0FC00
C5F96ECB vmovd xmm1, ebx
C4E27958C9 vpbroadcastd xmm1, xmm1
BBF0033F00 mov ebx, 0x3F03F0
C5F96ED3 vmovd xmm2, ebx
C4E27958D2 vpbroadcastd xmm2, xmm2
BB40000004 mov ebx, 0x4000040
C5F96EDB vmovd xmm3, ebx
C4E27958DB vpbroadcastd xmm3, xmm3
BB10000001 mov ebx, 0x1000010
C5F96EE3 vmovd xmm4, ebx
C4E27958E4 vpbroadcastd xmm4, xmm4
BB33000000 mov ebx, 51
C5F96EEB vmovd xmm5, ebx
C4E27978ED vpbroadcastb xmm5, xmm5
BB19000000 mov ebx, 25
C5F96EF3 vmovd xmm6, ebx
C4E27978F6 vpbroadcastb xmm6, xmm6
48BB3FDE95C2D27F0000 mov rbx, 0x7FD2C295DE3F
C5F9103B vmovupd xmm7, xmmword ptr [rbx]
G_M39793_IG11:
C57A6F02 vmovdqu xmm8, xmmword ptr [rdx]
C4623900C0 vpshufb xmm8, xmm8, xmm0
C539DBCA vpand xmm9, xmm8, xmm2
C531D5CC vpmullw xmm9, xmm9, xmm4
C539DBC1 vpand xmm8, xmm8, xmm1
C539E4C3 vpmulhuw xmm8, xmm8, xmm3
C44139EBC1 vpor xmm8, xmm8, xmm9
C53964CE vpcmpgtb xmm9, xmm8, xmm6
C539D8D5 vpsubusb xmm10, xmm8, xmm5
C44129F8C9 vpsubb xmm9, xmm10, xmm9
C4424100C9 vpshufb xmm9, xmm7, xmm9
C44139FCC1 vpaddb xmm8, xmm8, xmm9
C4417A7F02 vmovdqu xmmword ptr [r10], xmm8
4883C20C add rdx, 12
4983C210 add r10, 16
483BD0 cmp rdx, rax
76B9 jbe SHORT G_M39793_IG11
483BD1 cmp rdx, rcx
0F8405010000 je G_M39793_IG15
G_M39793_IG12:
4983C3FE add r11, -2
493BD3 cmp rdx, r11
0F838E000000 jae G_M39793_IG14
G_M39793_IG13:
0FB602 movzx rax, byte ptr [rdx]
0FB65A01 movzx rbx, byte ptr [rdx+1]
440FB67202 movzx r14, byte ptr [rdx+2]
C1E010 shl eax, 16
C1E308 shl ebx, 8
0BC3 or eax, ebx
410BC6 or eax, r14d
8BD8 mov ebx, eax
C1EB12 shr ebx, 18
49BE27DC95C2D27F0000 mov r14, 0x7FD2C295DC27
420FB61C33 movzx rbx, byte ptr [rbx+r14]
448BF0 mov r14d, eax
41C1EE0C shr r14d, 12
4183E63F and r14d, 63
49BF27DC95C2D27F0000 mov r15, 0x7FD2C295DC27
470FB6343E movzx r14, byte ptr [r14+r15]
448BF8 mov r15d, eax
41C1EF06 shr r15d, 6
4183E73F and r15d, 63
49BC27DC95C2D27F0000 mov r12, 0x7FD2C295DC27
470FB63C27 movzx r15, byte ptr [r15+r12]
83E03F and eax, 63
420FB60420 movzx rax, byte ptr [rax+r12]
41C1E608 shl r14d, 8
410BDE or ebx, r14d
41C1E710 shl r15d, 16
410BDF or ebx, r15d
C1E018 shl eax, 24
0BC3 or eax, ebx
418902 mov dword ptr [r10], eax
4883C203 add rdx, 3
4983C204 add r10, 4
493BD3 cmp rdx, r11
0F8272FFFFFF jb G_M39793_IG13
G_M39793_IG14:
498D4302 lea rax, [r11+2]
483BC1 cmp rax, rcx
0F85F4000000 jne G_M39793_IG20
807D1000 cmp byte ptr [rbp+10H], 0
0F8411010000 je G_M39793_IG22
488D4201 lea rax, [rdx+1]
483BC1 cmp rax, rcx
7548 jne SHORT G_M39793_IG16
0FB60A movzx rcx, byte ptr [rdx]
C1E108 shl ecx, 8
8BC1 mov eax, ecx
C1E80A shr eax, 10
49BB27DC95C2D27F0000 mov r11, 0x7FD2C295DC27
420FB60418 movzx rax, byte ptr [rax+r11]
C1E904 shr ecx, 4
83E13F and ecx, 63
420FB60C19 movzx rcx, byte ptr [rcx+r11]
C1E108 shl ecx, 8
0BC1 or eax, ecx
0D00003D00 or eax, 0x3D0000
0D0000003D or eax, 0x3D000000
418902 mov dword ptr [r10], eax
48FFC2 inc rdx
4983C204 add r10, 4
488BC2 mov rax, rdx
EB78 jmp SHORT G_M39793_IG17
G_M39793_IG15:
488BC2 mov rax, rdx
EB73 jmp SHORT G_M39793_IG17
G_M39793_IG16:
488D4202 lea rax, [rdx+2]
483BC1 cmp rax, rcx
0F8587000000 jne G_M39793_IG19
0FB602 movzx rax, byte ptr [rdx]
0FB64A01 movzx rcx, byte ptr [rdx+1]
C1E010 shl eax, 16
C1E108 shl ecx, 8
0BC1 or eax, ecx
8BC8 mov ecx, eax
C1E912 shr ecx, 18
49BB27DC95C2D27F0000 mov r11, 0x7FD2C295DC27
420FB60C19 movzx rcx, byte ptr [rcx+r11]
448BD8 mov r11d, eax
41C1EB0C shr r11d, 12
4183E33F and r11d, 63
48BB27DC95C2D27F0000 mov rbx, 0x7FD2C295DC27
450FB61C1B movzx r11, byte ptr [r11+rbx]
C1E806 shr eax, 6
83E03F and eax, 63
0FB60418 movzx rax, byte ptr [rax+rbx]
41C1E308 shl r11d, 8
410BCB or ecx, r11d
C1E010 shl eax, 16
0BC1 or eax, ecx
0D0000003D or eax, 0x3D000000
418902 mov dword ptr [r10], eax
4883C202 add rdx, 2
4983C204 add r10, 4
488BC2 mov rax, rdx
G_M39793_IG17:
482BC7 sub rax, rdi
418900 mov dword ptr [r8], eax
498BC2 mov rax, r10
482BC6 sub rax, rsi
418901 mov dword ptr [r9], eax
33C0 xor eax, eax
G_M39793_IG18:
C5F877 vzeroupper
488D65E0 lea rsp, [rbp-20H]
5B pop rbx
415C pop r12
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M39793_IG19:
488BC2 mov rax, rdx
EBDA jmp SHORT G_M39793_IG17
G_M39793_IG20:
488BC2 mov rax, rdx
482BC7 sub rax, rdi
418900 mov dword ptr [r8], eax
498BC2 mov rax, r10
482BC6 sub rax, rsi
418901 mov dword ptr [r9], eax
B801000000 mov eax, 1
G_M39793_IG21:
C5F877 vzeroupper
488D65E0 lea rsp, [rbp-20H]
5B pop rbx
415C pop r12
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M39793_IG22:
488BC2 mov rax, rdx
482BC7 sub rax, rdi
418900 mov dword ptr [r8], eax
498BC2 mov rax, r10
482BC6 sub rax, rsi
418901 mov dword ptr [r9], eax
B802000000 mov eax, 2
G_M39793_IG23:
C5F877 vzeroupper
488D65E0 lea rsp, [rbp-20H]
5B pop rbx
415C pop r12
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M39793_IG24:
33FF xor edi, edi
E8F0ECFFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int)
CC int3
; Total bytes of code 1145, prolog size 46 for method Base64:EncodeToUtf8(struct,struct,byref,byref,bool):int
; ============================================================ dasm for decode; Assembly listing for method Base64:DecodeFromUtf8(struct,struct,byref,byref,bool):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rbp based frame
; fully interruptible
; Final local variable assignments
;
; V00 arg0 [V00 ] ( 7, 5 ) struct (16) [rbp-0x38] do-not-enreg[XSFB] addr-exposed ld-addr-op
; V01 arg1 [V01 ] ( 4, 3 ) struct (16) [rbp-0x48] do-not-enreg[XSFB] addr-exposed ld-addr-op
; V02 arg2 [V02,T20] ( 7, 4.50) byref -> r8
; V03 arg3 [V03,T21] ( 7, 4.50) byref -> r9
; V04 arg4 [V04,T49] ( 3, 1.50) bool -> [rbp+0x10]
; V05 loc0 [V05,T25] ( 9, 4.50) long -> rsi
; V06 loc1 [V06 ] ( 1, 0.50) byref -> [rbp-0x50] must-init pinned
; V07 loc2 [V07,T26] ( 8, 4 ) long -> rcx
; V08 loc3 [V08 ] ( 1, 0.50) byref -> [rbp-0x58] must-init pinned
; V09 loc4 [V09,T28] ( 7, 3.50) int -> [rbp-0x5C]
; V10 loc5 [V10,T29] ( 6, 3 ) int -> [rbp-0x60]
; V11 loc6 [V11,T24] ( 10, 5 ) int -> rax
; V12 loc7 [V12,T50] ( 3, 1.50) int -> rbx
; V13 loc8 [V13,T00] ( 24, 36.50) long -> r14 ld-addr-op
; V14 loc9 [V14,T01] ( 28, 31.50) long -> r15 ld-addr-op
; V15 loc10 [V15,T27] ( 8, 4 ) long -> r12
; V16 loc11 [V16,T22] ( 6, 6.50) long -> rdx
; V17 loc12 [V17,T51] ( 3, 1.50) int -> [rbp-0x64]
;* V18 loc13 [V18,T59] ( 0, 0 ) byref -> zero-ref
; V19 loc14 [V19,T56] ( 2, 1 ) int -> rax
; V20 loc15 [V20,T57] ( 2, 1 ) int -> rdi
; V21 loc16 [V21,T32] ( 4, 2 ) int -> rdx
; V22 loc17 [V22,T52] ( 3, 1.50) int -> r11
; V23 loc18 [V23,T10] ( 20, 10 ) int -> rax
; V24 loc19 [V24,T33] ( 4, 2 ) int -> rdi
; V25 loc20 [V25,T34] ( 4, 2 ) long -> rdi
; V26 loc21 [V26,T23] ( 6, 6 ) long -> rax
; V27 loc22 [V27,T03] ( 5, 20 ) int -> r10
; V28 loc23 [V28,T35] ( 4, 2 ) int -> rdx
; V29 loc24 [V29,T58] ( 2, 1 ) int -> r11
; V30 loc25 [V30,T36] ( 4, 2 ) int -> rdx
;# V31 OutArgs [V31 ] ( 1, 1 ) lclBlk ( 0) [rsp+0x00] "OutgoingArgSpace"
; V32 tmp1 [V32,T53] ( 3, 1.50) int -> r13
;* V33 tmp2 [V33 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
;* V34 tmp3 [V34 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V35 tmp4 [V35 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V36 tmp5 [V36 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V37 tmp6 [V37,T98] ( 2, 2.50) simd32 -> mm0 "Inline stloc first use temp"
;* V38 tmp7 [V38 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V39 tmp8 [V39,T99] ( 2, 2.50) simd32 -> mm1 "Inline stloc first use temp"
;* V40 tmp9 [V40 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V41 tmp10 [V41,T100] ( 2, 2.50) simd32 -> mm2 "Inline stloc first use temp"
;* V42 tmp11 [V42 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V43 tmp12 [V43,T86] ( 4, 6.50) simd32 -> mm3 "Inline stloc first use temp"
;* V44 tmp13 [V44 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V45 tmp14 [V45,T101] ( 2, 2.50) simd32 -> mm4 "Inline stloc first use temp"
;* V46 tmp15 [V46 ] ( 0, 0 ) simd32 -> zero-ref "struct address for call/obj"
; V47 tmp16 [V47,T102] ( 2, 2.50) simd32 -> mm5 "Inline stloc first use temp"
;* V48 tmp17 [V48 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V49 tmp18 [V49,T103] ( 2, 2.50) simd32 -> mm6 "Inline stloc first use temp"
;* V50 tmp19 [V50 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V51 tmp20 [V51,T112] ( 2, 2 ) simd32 -> mm7 "struct address for call/obj"
; V52 tmp21 [V52,T104] ( 2, 2.50) simd32 -> mm7 "Inline stloc first use temp"
; V53 tmp22 [V53,T11] ( 6, 9 ) long -> r14 "Inline stloc first use temp"
; V54 tmp23 [V54,T18] ( 5, 7 ) long -> r15 "Inline stloc first use temp"
; V55 tmp24 [V55,T84] ( 9, 18 ) simd32 -> mm8 "Inline stloc first use temp"
; V56 tmp25 [V56,T88] ( 3, 6 ) simd32 -> mm9 "Inline stloc first use temp"
; V57 tmp26 [V57,T90] ( 2, 4 ) simd32 -> mm10 "Inline stloc first use temp"
; V58 tmp27 [V58,T91] ( 2, 4 ) simd32 -> mm11 "Inline stloc first use temp"
; V59 tmp28 [V59,T92] ( 2, 4 ) simd32 -> mm10 "Inline stloc first use temp"
; V60 tmp29 [V60,T93] ( 2, 4 ) simd32 -> mm9 "Inline stloc first use temp"
;* V61 tmp30 [V61 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V62 tmp31 [V62 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V63 tmp32 [V63 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V64 tmp33 [V64 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V65 tmp34 [V65 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V66 tmp35 [V66 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V67 tmp36 [V67 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V68 tmp37 [V68 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V69 tmp38 [V69 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V70 tmp39 [V70 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V71 tmp40 [V71 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V72 tmp41 [V72 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V73 tmp42 [V73 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V74 tmp43 [V74 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V75 tmp44 [V75 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V76 tmp45 [V76 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V77 tmp46 [V77 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V78 tmp47 [V78 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V79 tmp48 [V79 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V80 tmp49 [V80 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V81 tmp50 [V81,T113] ( 2, 1 ) simd32 -> mm4 "Inline return value spill temp"
; V82 tmp51 [V82,T114] ( 2, 1 ) simd16 -> mm4 "Inline stloc first use temp"
; V83 tmp52 [V83,T115] ( 2, 1 ) simd32 -> mm5 "Inline return value spill temp"
; V84 tmp53 [V84,T116] ( 2, 1 ) simd16 -> mm5 "Inline stloc first use temp"
;* V85 tmp54 [V85 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V86 tmp55 [V86 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V87 tmp56 [V87 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V88 tmp57 [V88 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V89 tmp58 [V89 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V90 tmp59 [V90 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V91 tmp60 [V91 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V92 tmp61 [V92 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V93 tmp62 [V93 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V94 tmp63 [V94 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V95 tmp64 [V95 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V96 tmp65 [V96,T105] ( 2, 2.50) simd16 -> mm0 "Inline stloc first use temp"
;* V97 tmp66 [V97 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V98 tmp67 [V98,T106] ( 2, 2.50) simd16 -> mm1 "Inline stloc first use temp"
;* V99 tmp68 [V99 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V100 tmp69 [V100,T107] ( 2, 2.50) simd16 -> mm2 "Inline stloc first use temp"
;* V101 tmp70 [V101 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V102 tmp71 [V102,T87] ( 4, 6.50) simd16 -> mm3 "Inline stloc first use temp"
;* V103 tmp72 [V103 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V104 tmp73 [V104,T108] ( 2, 2.50) simd16 -> mm4 "Inline stloc first use temp"
;* V105 tmp74 [V105 ] ( 0, 0 ) simd16 -> zero-ref "struct address for call/obj"
; V106 tmp75 [V106,T109] ( 2, 2.50) simd16 -> mm5 "Inline stloc first use temp"
;* V107 tmp76 [V107 ] ( 0, 0 ) struct (16) zero-ref "struct address for call/obj"
; V108 tmp77 [V108,T110] ( 2, 2.50) simd16 -> mm6 "Inline stloc first use temp"
; V109 tmp78 [V109,T111] ( 2, 2.50) simd16 -> mm7 "Inline stloc first use temp"
; V110 tmp79 [V110,T12] ( 6, 9 ) long -> r14 "Inline stloc first use temp"
; V111 tmp80 [V111,T19] ( 5, 7 ) long -> r15 "Inline stloc first use temp"
; V112 tmp81 [V112,T85] ( 9, 18 ) simd16 -> mm8 "Inline stloc first use temp"
; V113 tmp82 [V113,T89] ( 3, 6 ) simd16 -> mm9 "Inline stloc first use temp"
; V114 tmp83 [V114,T94] ( 2, 4 ) simd16 -> mm10 "Inline stloc first use temp"
; V115 tmp84 [V115,T95] ( 2, 4 ) simd16 -> mm11 "Inline stloc first use temp"
; V116 tmp85 [V116,T96] ( 2, 4 ) simd16 -> mm10 "Inline stloc first use temp"
; V117 tmp86 [V117,T97] ( 2, 4 ) simd16 -> mm9 "Inline stloc first use temp"
;* V118 tmp87 [V118 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V119 tmp88 [V119 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V120 tmp89 [V120 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V121 tmp90 [V121 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V122 tmp91 [V122 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V123 tmp92 [V123 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V124 tmp93 [V124 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V125 tmp94 [V125 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V126 tmp95 [V126 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V127 tmp96 [V127 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V128 tmp97 [V128 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V129 tmp98 [V129 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V130 tmp99 [V130 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V131 tmp100 [V131 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V132 tmp101 [V132 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V133 tmp102 [V133 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V134 tmp103 [V134 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V135 tmp104 [V135 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V136 tmp105 [V136 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V137 tmp106 [V137 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
; V138 tmp107 [V138,T117] ( 2, 1 ) simd16 -> mm4 "Inline return value spill temp"
; V139 tmp108 [V139,T118] ( 2, 1 ) simd16 -> mm4 "Inline stloc first use temp"
; V140 tmp109 [V140,T119] ( 2, 1 ) simd16 -> mm5 "Inline return value spill temp"
; V141 tmp110 [V141,T120] ( 2, 1 ) simd16 -> mm5 "Inline stloc first use temp"
;* V142 tmp111 [V142 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V143 tmp112 [V143 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V144 tmp113 [V144 ] ( 0, 0 ) struct (16) zero-ref "Inlining Arg"
;* V145 tmp114 [V145 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V146 tmp115 [V146 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;* V147 tmp116 [V147 ] ( 0, 0 ) struct (16) zero-ref "NewObj constructor temp"
;* V148 tmp117 [V148 ] ( 0, 0 ) struct ( 8) zero-ref "NewObj constructor temp"
;* V149 tmp118 [V149 ] ( 0, 0 ) struct (16) zero-ref ld-addr-op "Inlining Arg"
;* V150 tmp119 [V150 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V151 tmp120 [V151,T13] ( 2, 8 ) int -> rbx "Inline stloc first use temp"
; V152 tmp121 [V152,T14] ( 2, 8 ) int -> rdi "Inline stloc first use temp"
; V153 tmp122 [V153,T15] ( 2, 8 ) int -> r13 "Inline stloc first use temp"
; V154 tmp123 [V154,T16] ( 2, 8 ) int -> r11 "Inline stloc first use temp"
; V155 tmp124 [V155,T05] ( 2, 16 ) int -> r10 "impAppendStmt"
; V156 tmp125 [V156,T02] ( 6, 24 ) int -> rdi "Inline stloc first use temp"
; V157 tmp126 [V157,T04] ( 4, 16 ) int -> rbx "Inline stloc first use temp"
; V158 tmp127 [V158,T17] ( 2, 8 ) int -> r11 "Inline stloc first use temp"
;* V159 tmp128 [V159 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V160 tmp129 [V160,T06] ( 2, 16 ) long -> rbx "NewObj constructor temp"
;* V161 tmp130 [V161 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V162 tmp131 [V162 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V163 tmp132 [V163,T07] ( 2, 16 ) long -> rdi "NewObj constructor temp"
;* V164 tmp133 [V164 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V165 tmp134 [V165 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V166 tmp135 [V166,T08] ( 2, 16 ) long -> rbx "NewObj constructor temp"
;* V167 tmp136 [V167 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V168 tmp137 [V168 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V169 tmp138 [V169,T09] ( 2, 16 ) long -> r11 "NewObj constructor temp"
;* V170 tmp139 [V170 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V171 tmp140 [V171 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V172 tmp141 [V172 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V173 tmp142 [V173,T39] ( 2, 2 ) long -> rax "NewObj constructor temp"
;* V174 tmp143 [V174 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V175 tmp144 [V175 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V176 tmp145 [V176,T40] ( 2, 2 ) long -> rdi "NewObj constructor temp"
;* V177 tmp146 [V177 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V178 tmp147 [V178 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V179 tmp148 [V179,T41] ( 2, 2 ) long -> rdx "NewObj constructor temp"
;* V180 tmp149 [V180 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V181 tmp150 [V181 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V182 tmp151 [V182,T42] ( 2, 2 ) long -> r11 "NewObj constructor temp"
;* V183 tmp152 [V183 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V184 tmp153 [V184 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
;* V185 tmp154 [V185 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V186 tmp155 [V186,T43] ( 2, 2 ) long -> rdx "NewObj constructor temp"
;* V187 tmp156 [V187 ] ( 0, 0 ) long -> zero-ref "Inlining Arg"
; V188 tmp157 [V188,T44] ( 2, 2 ) int -> rax "Single return block return value"
;* V189 tmp158 [V189,T54] ( 0, 0 ) byref -> zero-ref V33._pointer(offs=0x00) P-INDEP "field V33._pointer (fldOffset=0x0)"
;* V190 tmp159 [V190 ] ( 0, 0 ) int -> zero-ref V33._length(offs=0x08) P-INDEP "field V33._length (fldOffset=0x8)"
; V191 tmp160 [V191,T47] ( 3, 1.50) byref -> rsi V34._pointer(offs=0x00) P-INDEP "field V34._pointer (fldOffset=0x0)"
;* V192 tmp161 [V192 ] ( 0, 0 ) int -> zero-ref V34._length(offs=0x08) P-INDEP "field V34._length (fldOffset=0x8)"
; V193 tmp162 [V193,T48] ( 3, 1.50) byref -> rcx V35._pointer(offs=0x00) P-INDEP "field V35._pointer (fldOffset=0x0)"
;* V194 tmp163 [V194 ] ( 0, 0 ) int -> zero-ref V35._length(offs=0x08) P-INDEP "field V35._length (fldOffset=0x8)"
;* V195 tmp164 [V195 ] ( 0, 0 ) byref -> zero-ref V36._pointer(offs=0x00) P-INDEP "field V36._pointer (fldOffset=0x0)"
;* V196 tmp165 [V196 ] ( 0, 0 ) int -> zero-ref V36._length(offs=0x08) P-INDEP "field V36._length (fldOffset=0x8)"
;* V197 tmp166 [V197 ] ( 0, 0 ) byref -> zero-ref V38._pointer(offs=0x00) P-INDEP "field V38._pointer (fldOffset=0x0)"
;* V198 tmp167 [V198 ] ( 0, 0 ) int -> zero-ref V38._length(offs=0x08) P-INDEP "field V38._length (fldOffset=0x8)"
;* V199 tmp168 [V199 ] ( 0, 0 ) byref -> zero-ref V40._pointer(offs=0x00) P-INDEP "field V40._pointer (fldOffset=0x0)"
;* V200 tmp169 [V200 ] ( 0, 0 ) int -> zero-ref V40._length(offs=0x08) P-INDEP "field V40._length (fldOffset=0x8)"
;* V201 tmp170 [V201 ] ( 0, 0 ) byref -> zero-ref V42._pointer(offs=0x00) P-INDEP "field V42._pointer (fldOffset=0x0)"
;* V202 tmp171 [V202 ] ( 0, 0 ) int -> zero-ref V42._length(offs=0x08) P-INDEP "field V42._length (fldOffset=0x8)"
;* V203 tmp172 [V203 ] ( 0, 0 ) byref -> zero-ref V48._pointer(offs=0x00) P-INDEP "field V48._pointer (fldOffset=0x0)"
;* V204 tmp173 [V204 ] ( 0, 0 ) int -> zero-ref V48._length(offs=0x08) P-INDEP "field V48._length (fldOffset=0x8)"
;* V205 tmp174 [V205 ] ( 0, 0 ) byref -> zero-ref V50._pointer(offs=0x00) P-INDEP "field V50._pointer (fldOffset=0x0)"
;* V206 tmp175 [V206 ] ( 0, 0 ) int -> zero-ref V50._length(offs=0x08) P-INDEP "field V50._length (fldOffset=0x8)"
;* V207 tmp176 [V207,T60] ( 0, 0 ) byref -> zero-ref V61._pointer(offs=0x00) P-INDEP "field V61._pointer (fldOffset=0x0)"
;* V208 tmp177 [V208 ] ( 0, 0 ) int -> zero-ref V61._length(offs=0x08) P-INDEP "field V61._length (fldOffset=0x8)"
;* V209 tmp178 [V209,T61] ( 0, 0 ) byref -> zero-ref V62._value(offs=0x00) P-INDEP "field V62._value (fldOffset=0x0)"
;* V210 tmp179 [V210 ] ( 0, 0 ) byref -> zero-ref V63._pointer(offs=0x00) P-INDEP "field V63._pointer (fldOffset=0x0)"
;* V211 tmp180 [V211 ] ( 0, 0 ) int -> zero-ref V63._length(offs=0x08) P-INDEP "field V63._length (fldOffset=0x8)"
;* V212 tmp181 [V212 ] ( 0, 0 ) byref -> zero-ref V64._pointer(offs=0x00) P-INDEP "field V64._pointer (fldOffset=0x0)"
;* V213 tmp182 [V213 ] ( 0, 0 ) int -> zero-ref V64._length(offs=0x08) P-INDEP "field V64._length (fldOffset=0x8)"
;* V214 tmp183 [V214,T62] ( 0, 0 ) byref -> zero-ref V66._pointer(offs=0x00) P-INDEP "field V66._pointer (fldOffset=0x0)"
;* V215 tmp184 [V215 ] ( 0, 0 ) int -> zero-ref V66._length(offs=0x08) P-INDEP "field V66._length (fldOffset=0x8)"
;* V216 tmp185 [V216,T63] ( 0, 0 ) byref -> zero-ref V67._value(offs=0x00) P-INDEP "field V67._value (fldOffset=0x0)"
;* V217 tmp186 [V217 ] ( 0, 0 ) byref -> zero-ref V68._pointer(offs=0x00) P-INDEP "field V68._pointer (fldOffset=0x0)"
;* V218 tmp187 [V218 ] ( 0, 0 ) int -> zero-ref V68._length(offs=0x08) P-INDEP "field V68._length (fldOffset=0x8)"
;* V219 tmp188 [V219 ] ( 0, 0 ) byref -> zero-ref V69._pointer(offs=0x00) P-INDEP "field V69._pointer (fldOffset=0x0)"
;* V220 tmp189 [V220 ] ( 0, 0 ) int -> zero-ref V69._length(offs=0x08) P-INDEP "field V69._length (fldOffset=0x8)"
;* V221 tmp190 [V221,T64] ( 0, 0 ) byref -> zero-ref V71._pointer(offs=0x00) P-INDEP "field V71._pointer (fldOffset=0x0)"
;* V222 tmp191 [V222 ] ( 0, 0 ) int -> zero-ref V71._length(offs=0x08) P-INDEP "field V71._length (fldOffset=0x8)"
;* V223 tmp192 [V223,T65] ( 0, 0 ) byref -> zero-ref V72._value(offs=0x00) P-INDEP "field V72._value (fldOffset=0x0)"
;* V224 tmp193 [V224 ] ( 0, 0 ) byref -> zero-ref V73._pointer(offs=0x00) P-INDEP "field V73._pointer (fldOffset=0x0)"
;* V225 tmp194 [V225 ] ( 0, 0 ) int -> zero-ref V73._length(offs=0x08) P-INDEP "field V73._length (fldOffset=0x8)"
;* V226 tmp195 [V226 ] ( 0, 0 ) byref -> zero-ref V74._pointer(offs=0x00) P-INDEP "field V74._pointer (fldOffset=0x0)"
;* V227 tmp196 [V227 ] ( 0, 0 ) int -> zero-ref V74._length(offs=0x08) P-INDEP "field V74._length (fldOffset=0x8)"
;* V228 tmp197 [V228,T66] ( 0, 0 ) byref -> zero-ref V76._pointer(offs=0x00) P-INDEP "field V76._pointer (fldOffset=0x0)"
;* V229 tmp198 [V229 ] ( 0, 0 ) int -> zero-ref V76._length(offs=0x08) P-INDEP "field V76._length (fldOffset=0x8)"
;* V230 tmp199 [V230,T67] ( 0, 0 ) byref -> zero-ref V77._value(offs=0x00) P-INDEP "field V77._value (fldOffset=0x0)"
;* V231 tmp200 [V231 ] ( 0, 0 ) byref -> zero-ref V78._pointer(offs=0x00) P-INDEP "field V78._pointer (fldOffset=0x0)"
;* V232 tmp201 [V232 ] ( 0, 0 ) int -> zero-ref V78._length(offs=0x08) P-INDEP "field V78._length (fldOffset=0x8)"
;* V233 tmp202 [V233 ] ( 0, 0 ) byref -> zero-ref V79._pointer(offs=0x00) P-INDEP "field V79._pointer (fldOffset=0x0)"
;* V234 tmp203 [V234 ] ( 0, 0 ) int -> zero-ref V79._length(offs=0x08) P-INDEP "field V79._length (fldOffset=0x8)"
;* V235 tmp204 [V235,T68] ( 0, 0 ) byref -> zero-ref V85._pointer(offs=0x00) P-INDEP "field V85._pointer (fldOffset=0x0)"
;* V236 tmp205 [V236 ] ( 0, 0 ) int -> zero-ref V85._length(offs=0x08) P-INDEP "field V85._length (fldOffset=0x8)"
;* V237 tmp206 [V237,T69] ( 0, 0 ) byref -> zero-ref V86._value(offs=0x00) P-INDEP "field V86._value (fldOffset=0x0)"
;* V238 tmp207 [V238 ] ( 0, 0 ) byref -> zero-ref V87._pointer(offs=0x00) P-INDEP "field V87._pointer (fldOffset=0x0)"
;* V239 tmp208 [V239 ] ( 0, 0 ) int -> zero-ref V87._length(offs=0x08) P-INDEP "field V87._length (fldOffset=0x8)"
;* V240 tmp209 [V240 ] ( 0, 0 ) byref -> zero-ref V88._pointer(offs=0x00) P-INDEP "field V88._pointer (fldOffset=0x0)"
;* V241 tmp210 [V241 ] ( 0, 0 ) int -> zero-ref V88._length(offs=0x08) P-INDEP "field V88._length (fldOffset=0x8)"
;* V242 tmp211 [V242,T70] ( 0, 0 ) byref -> zero-ref V90._pointer(offs=0x00) P-INDEP "field V90._pointer (fldOffset=0x0)"
;* V243 tmp212 [V243 ] ( 0, 0 ) int -> zero-ref V90._length(offs=0x08) P-INDEP "field V90._length (fldOffset=0x8)"
;* V244 tmp213 [V244,T71] ( 0, 0 ) byref -> zero-ref V91._value(offs=0x00) P-INDEP "field V91._value (fldOffset=0x0)"
;* V245 tmp214 [V245 ] ( 0, 0 ) byref -> zero-ref V92._pointer(offs=0x00) P-INDEP "field V92._pointer (fldOffset=0x0)"
;* V246 tmp215 [V246 ] ( 0, 0 ) int -> zero-ref V92._length(offs=0x08) P-INDEP "field V92._length (fldOffset=0x8)"
;* V247 tmp216 [V247 ] ( 0, 0 ) byref -> zero-ref V93._pointer(offs=0x00) P-INDEP "field V93._pointer (fldOffset=0x0)"
;* V248 tmp217 [V248 ] ( 0, 0 ) int -> zero-ref V93._length(offs=0x08) P-INDEP "field V93._length (fldOffset=0x8)"
;* V249 tmp218 [V249 ] ( 0, 0 ) byref -> zero-ref V95._pointer(offs=0x00) P-INDEP "field V95._pointer (fldOffset=0x0)"
;* V250 tmp219 [V250 ] ( 0, 0 ) int -> zero-ref V95._length(offs=0x08) P-INDEP "field V95._length (fldOffset=0x8)"
;* V251 tmp220 [V251 ] ( 0, 0 ) byref -> zero-ref V97._pointer(offs=0x00) P-INDEP "field V97._pointer (fldOffset=0x0)"
;* V252 tmp221 [V252 ] ( 0, 0 ) int -> zero-ref V97._length(offs=0x08) P-INDEP "field V97._length (fldOffset=0x8)"
;* V253 tmp222 [V253 ] ( 0, 0 ) byref -> zero-ref V99._pointer(offs=0x00) P-INDEP "field V99._pointer (fldOffset=0x0)"
;* V254 tmp223 [V254 ] ( 0, 0 ) int -> zero-ref V99._length(offs=0x08) P-INDEP "field V99._length (fldOffset=0x8)"
;* V255 tmp224 [V255 ] ( 0, 0 ) byref -> zero-ref V101._pointer(offs=0x00) P-INDEP "field V101._pointer (fldOffset=0x0)"
;* V256 tmp225 [V256 ] ( 0, 0 ) int -> zero-ref V101._length(offs=0x08) P-INDEP "field V101._length (fldOffset=0x8)"
;* V257 tmp226 [V257 ] ( 0, 0 ) byref -> zero-ref V107._pointer(offs=0x00) P-INDEP "field V107._pointer (fldOffset=0x0)"
;* V258 tmp227 [V258 ] ( 0, 0 ) int -> zero-ref V107._length(offs=0x08) P-INDEP "field V107._length (fldOffset=0x8)"
;* V259 tmp228 [V259,T72] ( 0, 0 ) byref -> zero-ref V118._pointer(offs=0x00) P-INDEP "field V118._pointer (fldOffset=0x0)"
;* V260 tmp229 [V260 ] ( 0, 0 ) int -> zero-ref V118._length(offs=0x08) P-INDEP "field V118._length (fldOffset=0x8)"
;* V261 tmp230 [V261,T73] ( 0, 0 ) byref -> zero-ref V119._value(offs=0x00) P-INDEP "field V119._value (fldOffset=0x0)"
;* V262 tmp231 [V262 ] ( 0, 0 ) byref -> zero-ref V120._pointer(offs=0x00) P-INDEP "field V120._pointer (fldOffset=0x0)"
;* V263 tmp232 [V263 ] ( 0, 0 ) int -> zero-ref V120._length(offs=0x08) P-INDEP "field V120._length (fldOffset=0x8)"
;* V264 tmp233 [V264 ] ( 0, 0 ) byref -> zero-ref V121._pointer(offs=0x00) P-INDEP "field V121._pointer (fldOffset=0x0)"
;* V265 tmp234 [V265 ] ( 0, 0 ) int -> zero-ref V121._length(offs=0x08) P-INDEP "field V121._length (fldOffset=0x8)"
;* V266 tmp235 [V266,T74] ( 0, 0 ) byref -> zero-ref V123._pointer(offs=0x00) P-INDEP "field V123._pointer (fldOffset=0x0)"
;* V267 tmp236 [V267 ] ( 0, 0 ) int -> zero-ref V123._length(offs=0x08) P-INDEP "field V123._length (fldOffset=0x8)"
;* V268 tmp237 [V268,T75] ( 0, 0 ) byref -> zero-ref V124._value(offs=0x00) P-INDEP "field V124._value (fldOffset=0x0)"
;* V269 tmp238 [V269 ] ( 0, 0 ) byref -> zero-ref V125._pointer(offs=0x00) P-INDEP "field V125._pointer (fldOffset=0x0)"
;* V270 tmp239 [V270 ] ( 0, 0 ) int -> zero-ref V125._length(offs=0x08) P-INDEP "field V125._length (fldOffset=0x8)"
;* V271 tmp240 [V271 ] ( 0, 0 ) byref -> zero-ref V126._pointer(offs=0x00) P-INDEP "field V126._pointer (fldOffset=0x0)"
;* V272 tmp241 [V272 ] ( 0, 0 ) int -> zero-ref V126._length(offs=0x08) P-INDEP "field V126._length (fldOffset=0x8)"
;* V273 tmp242 [V273,T76] ( 0, 0 ) byref -> zero-ref V128._pointer(offs=0x00) P-INDEP "field V128._pointer (fldOffset=0x0)"
;* V274 tmp243 [V274 ] ( 0, 0 ) int -> zero-ref V128._length(offs=0x08) P-INDEP "field V128._length (fldOffset=0x8)"
;* V275 tmp244 [V275,T77] ( 0, 0 ) byref -> zero-ref V129._value(offs=0x00) P-INDEP "field V129._value (fldOffset=0x0)"
;* V276 tmp245 [V276 ] ( 0, 0 ) byref -> zero-ref V130._pointer(offs=0x00) P-INDEP "field V130._pointer (fldOffset=0x0)"
;* V277 tmp246 [V277 ] ( 0, 0 ) int -> zero-ref V130._length(offs=0x08) P-INDEP "field V130._length (fldOffset=0x8)"
;* V278 tmp247 [V278 ] ( 0, 0 ) byref -> zero-ref V131._pointer(offs=0x00) P-INDEP "field V131._pointer (fldOffset=0x0)"
;* V279 tmp248 [V279 ] ( 0, 0 ) int -> zero-ref V131._length(offs=0x08) P-INDEP "field V131._length (fldOffset=0x8)"
;* V280 tmp249 [V280,T78] ( 0, 0 ) byref -> zero-ref V133._pointer(offs=0x00) P-INDEP "field V133._pointer (fldOffset=0x0)"
;* V281 tmp250 [V281 ] ( 0, 0 ) int -> zero-ref V133._length(offs=0x08) P-INDEP "field V133._length (fldOffset=0x8)"
;* V282 tmp251 [V282,T79] ( 0, 0 ) byref -> zero-ref V134._value(offs=0x00) P-INDEP "field V134._value (fldOffset=0x0)"
;* V283 tmp252 [V283 ] ( 0, 0 ) byref -> zero-ref V135._pointer(offs=0x00) P-INDEP "field V135._pointer (fldOffset=0x0)"
;* V284 tmp253 [V284 ] ( 0, 0 ) int -> zero-ref V135._length(offs=0x08) P-INDEP "field V135._length (fldOffset=0x8)"
;* V285 tmp254 [V285 ] ( 0, 0 ) byref -> zero-ref V136._pointer(offs=0x00) P-INDEP "field V136._pointer (fldOffset=0x0)"
;* V286 tmp255 [V286 ] ( 0, 0 ) int -> zero-ref V136._length(offs=0x08) P-INDEP "field V136._length (fldOffset=0x8)"
;* V287 tmp256 [V287,T80] ( 0, 0 ) byref -> zero-ref V142._pointer(offs=0x00) P-INDEP "field V142._pointer (fldOffset=0x0)"
;* V288 tmp257 [V288 ] ( 0, 0 ) int -> zero-ref V142._length(offs=0x08) P-INDEP "field V142._length (fldOffset=0x8)"
;* V289 tmp258 [V289,T81] ( 0, 0 ) byref -> zero-ref V143._value(offs=0x00) P-INDEP "field V143._value (fldOffset=0x0)"
;* V290 tmp259 [V290 ] ( 0, 0 ) byref -> zero-ref V144._pointer(offs=0x00) P-INDEP "field V144._pointer (fldOffset=0x0)"
;* V291 tmp260 [V291 ] ( 0, 0 ) int -> zero-ref V144._length(offs=0x08) P-INDEP "field V144._length (fldOffset=0x8)"
;* V292 tmp261 [V292 ] ( 0, 0 ) byref -> zero-ref V145._pointer(offs=0x00) P-INDEP "field V145._pointer (fldOffset=0x0)"
;* V293 tmp262 [V293 ] ( 0, 0 ) int -> zero-ref V145._length(offs=0x08) P-INDEP "field V145._length (fldOffset=0x8)"
;* V294 tmp263 [V294,T55] ( 0, 0 ) byref -> zero-ref V147._pointer(offs=0x00) P-INDEP "field V147._pointer (fldOffset=0x0)"
;* V295 tmp264 [V295 ] ( 0, 0 ) int -> zero-ref V147._length(offs=0x08) P-INDEP "field V147._length (fldOffset=0x8)"
;* V296 tmp265 [V296,T82] ( 0, 0 ) byref -> zero-ref V148._value(offs=0x00) P-INDEP "field V148._value (fldOffset=0x0)"
;* V297 tmp266 [V297,T83] ( 0, 0 ) byref -> zero-ref V149._pointer(offs=0x00) P-INDEP "field V149._pointer (fldOffset=0x0)"
;* V298 tmp267 [V298 ] ( 0, 0 ) int -> zero-ref V149._length(offs=0x08) P-INDEP "field V149._length (fldOffset=0x8)"
; V299 tmp268 [V299,T37] ( 2, 2 ) byref -> rax "BlockOp address local"
; V300 tmp269 [V300,T45] ( 2, 2 ) long -> rsi "Cast away GC"
; V301 tmp270 [V301,T38] ( 2, 2 ) byref -> rax "BlockOp address local"
; V302 tmp271 [V302,T46] ( 2, 2 ) long -> rcx "Cast away GC"
; V303 rat0 [V303,T30] ( 3, 3 ) int -> rdx "ReplaceWithLclVar is creating a new local variable"
; V304 rat1 [V304,T31] ( 3, 3 ) int -> rdx "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 72
G_M25171_IG01:
55 push rbp
4157 push r15
4156 push r14
4155 push r13
4154 push r12
53 push rbx
4883EC48 sub rsp, 72
C5F877 vzeroupper
488D6C2470 lea rbp, [rsp+70H]
33C0 xor rax, rax
488945B0 mov qword ptr [rbp-50H], rax
488945A8 mov qword ptr [rbp-58H], rax
48897DC8 mov bword ptr [rbp-38H], rdi
488975D0 mov qword ptr [rbp-30H], rsi
488955B8 mov bword ptr [rbp-48H], rdx
48894DC0 mov qword ptr [rbp-40H], rcx
8B7D10 mov edi, dword ptr [rbp+10H]
G_M25171_IG02:
837DD000 cmp dword ptr [rbp-30H], 0
770D ja SHORT G_M25171_IG03
33C0 xor eax, eax
418900 mov dword ptr [r8], eax
418901 mov dword ptr [r9], eax
E982040000 jmp G_M25171_IG23
G_M25171_IG03:
488D45C8 lea rax, bword ptr [rbp-38H]
488B30 mov rsi, bword ptr [rax]
488975B0 mov bword ptr [rbp-50H], rsi
488D45B8 lea rax, bword ptr [rbp-48H]
488B08 mov rcx, bword ptr [rax]
48894DA8 mov bword ptr [rbp-58H], rcx
448B55D0 mov r10d, dword ptr [rbp-30H]
4183E2FC and r10d, -4
448B5DC0 mov r11d, dword ptr [rbp-40H]
418BC2 mov eax, r10d
85C0 test eax, eax
0F8CF9040000 jl G_M25171_IG31
G_M25171_IG04:
8BD0 mov edx, eax
C1FA02 sar edx, 2
8D1C52 lea ebx, [rdx+2*rdx]
8D53FE lea edx, [rbx-2]
443BDA cmp r11d, edx
7D14 jge SHORT G_M25171_IG05
BA56555555 mov edx, 0x55555556
8BC2 mov eax, edx
41F7EB imul edx:eax, r11d
8BC2 mov eax, edx
C1E81F shr eax, 31
03C2 add eax, edx
C1E002 shl eax, 2
G_M25171_IG05:
4C8BF6 mov r14, rsi
4C8BF9 mov r15, rcx
458BE2 mov r12d, r10d
4D03E6 add r12, r14
8BD0 mov edx, eax
4903D6 add rdx, r14
83F818 cmp eax, 24
0F8CF1010000 jl G_M25171_IG11
488D42D3 lea rax, [rdx-45]
483BC6 cmp rax, rsi
0F82F7000000 jb G_M25171_IG08
49BED7DD95C2D27F0000 mov r14, 0x7FD2C295DDD7
C4C17D1006 vmovupd ymm0, ymmword ptr[r14]
49BFE7DB95C2D27F0000 mov r15, 0x7FD2C295DBE7
C4C17D100F vmovupd ymm1, ymmword ptr[r15]
49BE07DC95C2D27F0000 mov r14, 0x7FD2C295DC07
C4C17D1016 vmovupd ymm2, ymmword ptr[r14]
49BE87DD95C2D27F0000 mov r14, 0x7FD2C295DD87
C4C17D101E vmovupd ymm3, ymmword ptr[r14]
41BE40014001 mov r14d, 0x1400140
C4C1796EE6 vmovd xmm4, r14d
C4E27D58E4 vpbroadcastd ymm4, ymm4
41BE00100100 mov r14d, 0x11000
C4C1796EEE vmovd xmm5, r14d
C4E27D58ED vpbroadcastd ymm5, ymm5
49BE5FDE95C2D27F0000 mov r14, 0x7FD2C295DE5F
C4C17D1036 vmovupd ymm6, ymmword ptr[r14]
49BEA7DE95C2D27F0000 mov r14, 0x7FD2C295DEA7
C4C17D103E vmovupd ymm7, ymmword ptr[r14]
4C8BF6 mov r14, rsi
4C8BF9 mov r15, rcx
G_M25171_IG06:
C4417E6F06 vmovdqu ymm8, ymmword ptr[r14]
C4C13572D004 vpsrld ymm9, ymm8, 4
C535DBCB vpand ymm9, ymm9, ymm3
C53DDBD3 vpand ymm10, ymm8, ymm3
C4427D00D9 vpshufb ymm11, ymm0, ymm9
C4427500D2 vpshufb ymm10, ymm1, ymm10
C4427D17D3 vptest ymm10, ymm11
410F94C5 sete r13b
450FB6ED movzx r13, r13b
4585ED test r13d, r13d
743D je SHORT G_M25171_IG07
C53D74D3 vpcmpeqb ymm10, ymm8, ymm3
C4412DFCC9 vpaddb ymm9, ymm10, ymm9
C4426D00C9 vpshufb ymm9, ymm2, ymm9
C4413DFCC1 vpaddb ymm8, ymm8, ymm9
C4623D04C4 vpmaddubsw ymm8, ymm8, ymm4
C53DF5C5 vpmaddwd ymm8, ymm8, ymm5
C4623D00C6 vpshufb ymm8, ymm8, ymm6
C4424536C0 vpermd ymm8, ymm7, ymm8
C4417E7F07 vmovdqu ymmword ptr[r15], ymm8
4983C620 add r14, 32
4983C718 add r15, 24
4C3BF0 cmp r14, rax
7699 jbe SHORT G_M25171_IG06
G_M25171_IG07:
4D3BF4 cmp r14, r12
0F8401030000 je G_M25171_IG22
G_M25171_IG08:
488D42E8 lea rax, [rdx-24]
493BC6 cmp rax, r14
0F82E0000000 jb G_M25171_IG11
48BAA7DD95C2D27F0000 mov rdx, 0x7FD2C295DDA7
C5F91002 vmovupd xmm0, xmmword ptr [rdx]
48BA2FDE95C2D27F0000 mov rdx, 0x7FD2C295DE2F
C5F9100A vmovupd xmm1, xmmword ptr [rdx]
48BAFFDD95C2D27F0000 mov rdx, 0x7FD2C295DDFF
C5F91012 vmovupd xmm2, xmmword ptr [rdx]
48BAC7DD95C2D27F0000 mov rdx, 0x7FD2C295DDC7
C5F9101A vmovupd xmm3, xmmword ptr [rdx]
BA40014001 mov edx, 0x1400140
C5F96EE2 vmovd xmm4, edx
C4E27958E4 vpbroadcastd xmm4, xmm4
BA00100100 mov edx, 0x11000
C5F96EEA vmovd xmm5, edx
C4E27958ED vpbroadcastd xmm5, xmm5
48BA4FDE95C2D27F0000 mov rdx, 0x7FD2C295DE4F
C5F91032 vmovupd xmm6, xmmword ptr [rdx]
C5C057FF vxorps xmm7, xmm7, xmm7
G_M25171_IG09:
C4417A6F06 vmovdqu xmm8, xmmword ptr [r14]
C4C13172D004 vpsrld xmm9, xmm8, 4
C531DBCB vpand xmm9, xmm9, xmm3
C539DBD3 vpand xmm10, xmm8, xmm3
C4427900D9 vpshufb xmm11, xmm0, xmm9
C4427100D2 vpshufb xmm10, xmm1, xmm10
C44129DBD3 vpand xmm10, xmm10, xmm11
C52964D7 vpcmpgtb xmm10, xmm10, xmm7
C4C179D7D2 vpmovmskb edx, xmm10
85D2 test edx, edx
7538 jne SHORT G_M25171_IG10
C53974D3 vpcmpeqb xmm10, xmm8, xmm3
C44129FCC9 vpaddb xmm9, xmm10, xmm9
C4426900C9 vpshufb xmm9, xmm2, xmm9
C44139FCC1 vpaddb xmm8, xmm8, xmm9
C4623904C4 vpmaddubsw xmm8, xmm8, xmm4
C539F5C5 vpmaddwd xmm8, xmm8, xmm5
C4623900C6 vpshufb xmm8, xmm8, xmm6
C4417A7F07 vmovdqu xmmword ptr [r15], xmm8
4983C610 add r14, 16
4983C70C add r15, 12
4C3BF0 cmp r14, rax
769E jbe SHORT G_M25171_IG09
G_M25171_IG10:
4D3BF4 cmp r14, r12
0F8414020000 je G_M25171_IG22
G_M25171_IG11:
897D10 mov dword ptr [rbp+10H], edi
4084FF test dil, dil
7505 jne SHORT G_M25171_IG12
4533ED xor r13d, r13d
EB06 jmp SHORT G_M25171_IG13
G_M25171_IG12:
41BD04000000 mov r13d, 4
G_M25171_IG13:
443BDB cmp r11d, ebx
7C10 jl SHORT G_M25171_IG14
448955A4 mov dword ptr [rbp-5CH], r10d
44896D9C mov dword ptr [rbp-64H], r13d
418BC2 mov eax, r10d
412BC5 sub eax, r13d
EB24 jmp SHORT G_M25171_IG15
G_M25171_IG14:
BA56555555 mov edx, 0x55555556
44895DA0 mov dword ptr [rbp-60H], r11d
8BC2 mov eax, edx
41F7EB imul edx:eax, r11d
8BC2 mov eax, edx
C1E81F shr eax, 31
03C2 add eax, edx
C1E002 shl eax, 2
448955A4 mov dword ptr [rbp-5CH], r10d
44896D9C mov dword ptr [rbp-64H], r13d
448B5DA0 mov r11d, dword ptr [rbp-60H]
G_M25171_IG15:
8BD0 mov edx, eax
4803D6 add rdx, rsi
4C3BF2 cmp r14, rdx
44895DA0 mov dword ptr [rbp-60H], r11d
0F8392000000 jae G_M25171_IG17
G_M25171_IG16:
410FB61E movzx rbx, byte ptr [r14]
410FB67E01 movzx rdi, byte ptr [r14+1]
450FB66E02 movzx r13, byte ptr [r14+2]
450FB65E03 movzx r11, byte ptr [r14+3]
8BDB mov ebx, ebx
49BA67DC95C2D27F0000 mov r10, 0x7FD2C295DC67
4E0FBE1413 movsx r10, byte ptr [rbx+r10]
8BFF mov edi, edi
48BB67DC95C2D27F0000 mov rbx, 0x7FD2C295DC67
480FBE3C1F movsx rdi, byte ptr [rdi+rbx]
418BDD mov ebx, r13d
49BD67DC95C2D27F0000 mov r13, 0x7FD2C295DC67
4A0FBE1C2B movsx rbx, byte ptr [rbx+r13]
458BDB mov r11d, r11d
4F0FBE1C2B movsx r11, byte ptr [r11+r13]
C1E70C shl edi, 12
C1E306 shl ebx, 6
0BFB or edi, ebx
41C1E212 shl r10d, 18
450BD3 or r10d, r11d
440BD7 or r10d, edi
4585D2 test r10d, r10d
0F8CD9010000 jl G_M25171_IG29
418BFA mov edi, r10d
C1FF10 sar edi, 16
41883F mov byte ptr [r15], dil
418BFA mov edi, r10d
C1FF08 sar edi, 8
41887F01 mov byte ptr [r15+1], dil
45885702 mov byte ptr [r15+2], r10b
4983C604 add r14, 4
4983C703 add r15, 3
4C3BF2 cmp r14, rdx
0F826EFFFFFF jb G_M25171_IG16
G_M25171_IG17:
448B55A4 mov r10d, dword ptr [rbp-5CH]
418BFA mov edi, r10d
2B7D9C sub edi, dword ptr [rbp-64H]
3BF8 cmp edi, eax
0F8538010000 jne G_M25171_IG25
4D3BF4 cmp r14, r12
750F jne SHORT G_M25171_IG18
807D1000 cmp byte ptr [rbp+10H], 0
0F8467010000 je G_M25171_IG27
E98B010000 jmp G_M25171_IG29
G_M25171_IG18:
410FB64424FC movzx rax, byte ptr [r12-4]
410FB67C24FD movzx rdi, byte ptr [r12-3]
410FB65424FE movzx rdx, byte ptr [r12-2]
450FB65C24FF movzx r11, byte ptr [r12-1]
8BC0 mov eax, eax
48BB67DC95C2D27F0000 mov rbx, 0x7FD2C295DC67
480FBE0418 movsx rax, byte ptr [rax+rbx]
8BFF mov edi, edi
480FBE3C1F movsx rdi, byte ptr [rdi+rbx]
C1E012 shl eax, 18
C1E70C shl edi, 12
0BC7 or eax, edi
8B7DA0 mov edi, dword ptr [rbp-60H]
4803F9 add rdi, rcx
4183FB3D cmp r11d, 61
7451 je SHORT G_M25171_IG19
8BD2 mov edx, edx
48BB67DC95C2D27F0000 mov rbx, 0x7FD2C295DC67
480FBE141A movsx rdx, byte ptr [rdx+rbx]
458BDB mov r11d, r11d
4D0FBE1C1B movsx r11, byte ptr [r11+rbx]
C1E206 shl edx, 6
410BC3 or eax, r11d
0BC2 or eax, edx
85C0 test eax, eax
0F8C1E010000 jl G_M25171_IG29
498D5703 lea rdx, [r15+3]
483BD7 cmp rdx, rdi
0F87AA000000 ja G_M25171_IG25
8BF8 mov edi, eax
C1FF10 sar edi, 16
41883F mov byte ptr [r15], dil
8BF8 mov edi, eax
C1FF08 sar edi, 8
41887F01 mov byte ptr [r15+1], dil
41884702 mov byte ptr [r15+2], al
4983C703 add r15, 3
EB5B jmp SHORT G_M25171_IG21
G_M25171_IG19:
83FA3D cmp edx, 61
743C je SHORT G_M25171_IG20
8BD2 mov edx, edx
49BB67DC95C2D27F0000 mov r11, 0x7FD2C295DC67
4A0FBE141A movsx rdx, byte ptr [rdx+r11]
C1E206 shl edx, 6
0BC2 or eax, edx
85C0 test eax, eax
0F8CD3000000 jl G_M25171_IG29
498D5702 lea rdx, [r15+2]
483BD7 cmp rdx, rdi
7763 ja SHORT G_M25171_IG25
8BF8 mov edi, eax
C1FF10 sar edi, 16
41883F mov byte ptr [r15], dil
C1F808 sar eax, 8
41884701 mov byte ptr [r15+1], al
4983C702 add r15, 2
EB1A jmp SHORT G_M25171_IG21
G_M25171_IG20:
85C0 test eax, eax
0F8CAD000000 jl G_M25171_IG29
498D5701 lea rdx, [r15+1]
483BD7 cmp rdx, rdi
773D ja SHORT G_M25171_IG25
C1F810 sar eax, 16
418807 mov byte ptr [r15], al
49FFC7 inc r15
G_M25171_IG21:
4983C604 add r14, 4
443B55D0 cmp r10d, dword ptr [rbp-30H]
0F858D000000 jne G_M25171_IG29
G_M25171_IG22:
498BC6 mov rax, r14
482BC6 sub rax, rsi
418900 mov dword ptr [r8], eax
498BC7 mov rax, r15
482BC1 sub rax, rcx
418901 mov dword ptr [r9], eax
G_M25171_IG23:
33C0 xor eax, eax
G_M25171_IG24:
C5F877 vzeroupper
488D65D8 lea rsp, [rbp-28H]
5B pop rbx
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M25171_IG25:
443B55D0 cmp r10d, dword ptr [rbp-30H]
0F95C0 setne al
0FB6C0 movzx rax, al
8B7D10 mov edi, dword ptr [rbp+10H]
400FB6FF movzx rdi, dil
85C7 test eax, edi
7552 jne SHORT G_M25171_IG29
498BC6 mov rax, r14
482BC6 sub rax, rsi
418900 mov dword ptr [r8], eax
498BC7 mov rax, r15
482BC1 sub rax, rcx
418901 mov dword ptr [r9], eax
B801000000 mov eax, 1
G_M25171_IG26:
C5F877 vzeroupper
488D65D8 lea rsp, [rbp-28H]
5B pop rbx
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M25171_IG27:
498BC6 mov rax, r14
482BC6 sub rax, rsi
418900 mov dword ptr [r8], eax
498BC7 mov rax, r15
482BC1 sub rax, rcx
418901 mov dword ptr [r9], eax
B802000000 mov eax, 2
G_M25171_IG28:
C5F877 vzeroupper
488D65D8 lea rsp, [rbp-28H]
5B pop rbx
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M25171_IG29:
498BC6 mov rax, r14
482BC6 sub rax, rsi
418900 mov dword ptr [r8], eax
498BC7 mov rax, r15
482BC1 sub rax, rcx
418901 mov dword ptr [r9], eax
B803000000 mov eax, 3
G_M25171_IG30:
C5F877 vzeroupper
488D65D8 lea rsp, [rbp-28H]
5B pop rbx
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret
G_M25171_IG31:
33FF xor edi, edi
E84BE7FFFF call ThrowHelper:ThrowArgumentOutOfRangeException(int)
CC int3
; Total bytes of code 1374, prolog size 51 for method Base64:DecodeFromUtf8(struct,struct,byref,byref,bool):int
; ============================================================ Sorry for some noise with force-pushing. Had copy & paste errors from the benchmark-project... |
Improved the version done in c8b6cb3, so the static data isn't needed and code is more compact and readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this can be merged once tests complete
Thanks @gfoidl! 🎉 |
* Optimized scalar code-path * Fixed label names * Implemented vectorized versions * Added reference to source of algorithm * Added back missing namespace * Unsafe.Add instead of Unsafe.Subtract Fixed build-failure (https://ci3.dot.net/job/dotnet_corefx/job/master/job/linux-musl-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/8247/console) Seems like the internal Unsafe doesn't have a Subtract method, so use Add instead. * Added THIRD-PARTY-NOTICES * PR Feedback * THIRD-PARTY-NOTICES in repo-base instead instead in folder Cf. dotnet/corefx#34529 (comment) * PR Feedback * dotnet/corefx#34529 (comment) * dotnet/corefx#34529 (comment) * Rewritten to use raw-pointers instead of GC-tracked refs Cf. dotnet/corefx#34529 (comment) * Initialized the static fields directly (i.e. w/o cctor) Cf. dotnet/corefx#34529 (comment) * Added a test for decoding a (encoded) Guid The case with decoding encoded 16 bytes was not covered by tests, so a wrong code got commited before, resulting in DestinationTooSmall instead of the correct Done. * EncodingMap / DecodingMap as byref instead of pointer So got rid of the `rep stosd` in the prolog. Cf. dotnet/corefx#34529 (comment) * PR Feedback * dotnet/corefx#34529 (comment) * Debug.Fail instead throwing for the assertion Cf. dotnet/corefx#34529 (comment) * ROSpan for static data * ROS for lookup maps * In decode avoided stack spill and hoisted zero-vector outside the loops Cf. dotnet/corefx#34529 (comment) * Assert assumption about destLength Cf. dotnet/corefx#34529 (comment) * Added comments from original source and some changes to variable names Cf. dotnet/corefx#34529 (comment) and dotnet/corefx#34529 (comment) * Use TestZ instead of MoveMask in AVX2-path Cf. dotnet/corefx#34529 (comment) * Fixed too complicated mask2F creation Improved the version done in dotnet/corefx@c8b6cb3, so the static data isn't needed and code is more compact and readable. Commit migrated from dotnet/corefx@036e0a6
Description
Fixes https://github.com/dotnet/corefx/issues/32365
The code is based and inspired on the C-code from https://github.com/aklomp/base64 wich is licensed under BSD 2-Clause "Simplified" License.
Base64 encoding with SIMD instructions and Base64 decoding with SIMD instructions give an outline of the algorithm, as it is not very intuitive.
I kept the variables, namens, etc. as close as possible to the original code.
A version for
Convert.ToBase64String
is done in dotnet/coreclr#21833Benchmarks
As mentioned in https://github.com/dotnet/corefx/issues/32365#issuecomment-443420296 I've created a separate package for base64 encoding / decoding (main motivation was base64url support, and playing with the intrinsics). The code here is more or less an adaption from that code for corefx, but in essence it is the same (at least after JITing the "work"-code is the same).
Therefore I'll show the perf-numbers based on that code.
The benchmarks were done with sizes 5 (mini -- testing overhead), 16 (eg. a Guid), 1000.
HardwareIntrinsicsCustomConfig is used to run the benchmarks with AVX2, SSSE3 and pure scalar.
Summary of results
I'll give a brief summary of the results, as the table is quite large.
For encoding speedups from 10% to 1000% and more (the longer the input, the greater the speedup) are reported. In the scalar case mainly due the elimination of
movsxd
from the loop.Decoding doesn't have as huge speedups as encoding has, because the input has to be checked to be valid base64 characters, but still speedups of 500% are shown.
The scalar case shows a regression of 10-20%. For me this is OK, as input-sizes of 5 seem pretty uncommon.
Encode
Benchmark
Decode
Benchmark
Notes
Alignment isn't considered in this code (and I'm not aware of a base64 implementation that considers alignment).
For encoding the writes could be cache-aligned, as there are always written four bytes or multiples of four bytes (and 64 % 4 = 0). But
For decoding it is similar, except that there are always read four bytes, so reading could be aligned.